DIGITAL MORPHOGENESIS VIA SCHELLING SEGREGATION 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



Abstract. Schelling's model of racial segregation has been extensively studied but has 
largely resisted rigorous analysis. According to the model, a large number of individuals 
T-H of two types have their behaviour specified by two parameters. First, each is concerned 

only with their own 'neighbourhood', which is the set of individuals within a certain 
distance w of their location. Second, each has an 'intolerance level' r, which is the 
proportion of their neighbourhood which they require to be their own type in order to 
, ^ be happy. Unhappy pairs of individuals are then chosen uniformly at random and are 

given the opportunity to swap locations, causing large segregated regions to form from 
^ an initially mixed configuration. 

In [2], Brandt, Immorlica, Kamath and Kleinberg provided the first rigorous analysis 
I— I of the 1 dimensional unperturbed version of Schelling's model, for the case t = 0.5. Here 

we provide a rigorous analysis of the model's behaviour more generally for r G [0, 1], and 
establish that as one varies r, surprising forms of threshold behaviour result, notably the 
^ existence of situations where an increased level of intolerance leads almost certainly to 

O decreased segregation. 

(N 
> 

^ 1. Introduction 

A major achievement of the economist and game theorist Thomas Schelhng was an 
^ elegant model of racial segregation, first described in 1969 [17]. This work is often cited 

CO (see for instance [7]) as one of the first agent-based computational economic models, and 

^ variants continue to be a popular topic among researchers in this discipline [7, 4, 8, 9]. It is 

^ also interesting to observe that Schelling's model can be understood within the framework 

^ of morphogenesis, the analysis of how structure can arise from an initially random, or near 

^ random configuration. There has been work on similar ideas from other quarters. For 

instance, while Alan Turing is best known in the mathematical and computer science 
communities for his work formalising the algorithmically calculable functions, his most 
cited work [18] actually relates to morphogenesis. Turing was interested in understanding 
certain biological processes: the gastrulation phase of embryonic development, the process 



Authors are listed alphabetically. Andy Lewis-Pye was previously called Andrew Lewis and was sup- 
ported by a Royal Society University Research Fellowship. Barmpalias was supported by the Research 
Fund for International Young Scientists number 611501-10168 from the National Natural Science Foun- 
dation of China, and an International Young Scientist Fellowship number 2010-Y2GB03 from the Chinese 
Academy of Sciences; partial support was also received from the project Network Algorithms and Digital 
Information number ISCAS2010-01 from the Institute of Software, Chinese Academy of Sciences. 

1 



2 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



whereby dappling effects arise on animal coats, and phyllotaxy, i.e. the arrangement of 
leaves on plant stems. 

Schelling's model looks to describe how individuals of different races come to organise 
themselves geographically within a community, into segregated regions, each of largely 
one race. Part of the significance of this model at the time was that it provided proof of a 
recurrent theme in Schelling's work - that the behaviour of individuals according to their 
interests at the local level, can produce results at the global level which are undesired by 
all. Running small simulations of his model, for example, Schelling observed that large 
levels of segregation can result in communities, when each individual has no preference for 
segregation, but still requires a certain low proportion of their own local neighbourhood to 
be of their own race. As noted by Zhang in [25] (which includes a good survey of research 
in the area), Schelling's model has every ingredient of a great theoretical work: it addresses 
an important real world issue, is simple to understand, produces unexpected results and 
it has deep implications for various social sciences. Although the explicit concern is racial 
segregation, the analysis is sufficiently abstract that any situation in which objects of two 
types arrange themselves geographically according to a certain preference not to be of 
a minority type in their neighbourhood, could constitute an interpretation. As pointed 
out in [20], for example, Schelling's model can be seen as a finite difference version of a 
differential equation describing interparticle forces. While the model has been extensively 
studied, see also for example [1, 13, 6, 11, 12, 16, 23, 24], it has been difficult to rigorously 
prove that the model produces the segregation behaviour clearly observed when running 
simulations. 

We concentrate here on the one-dimensional version of the model, as in [2]. The model 
works as follows. One begins with a large number n of nodes (individuals) arranged in a 
circle. Each node is initially assigned a type, and has probability \ of being of type a and 
probability \ of being of type /3 (the types of distinct individuals being independently 
distributed). We fix a parameter w, which specifies the 'neighbourhood' of each node 
in the following way: at each point in time the neighbourhood of the node m, denoted 
A/'(ti), is the set containing u and the w-many closest neighbours on both sides - so the 
neighbourhood consists of 2w + 1 many nodes in total. The second parameter r G [0, 1] 
specifies the proportion of a node's neighbourhood which must be of their type before 
they are happy. So, at any given moment in time, we define u to be happy if at least 
t{2w + 1) of the nodes in M{u) are of the same type as u. One then considers a discrete 
time process, in which, at each stage, one pair of unhappy individuals of opposite types 
are selected uniformly at random and are given the opportunity to swap locations. We 
work according to the assumption that the swap will take place as long as each member 
of the pair has at least as many neighbours of the same type at their new location as at 
their former one (note that for t < ^ this will automatically be the case). The process 
ends when (and if) one reaches a stage at which there are no longer unhappy individuals 
of both types. 



SCHELLING SEGREGATION 3 

Much of the difficulty in providing a rigorous analysis stems from the lack of a well 
defined unique stable distribution for the underlying Markov process. Various authors 
have therefore worked with variants of the model in which perturbations are introduced 
into the dynamics so as to avoid this problem, i.e. the model is altered by introduc- 
ing a further random element, allowing individuals to sometimes make moves which are 
detrimental with respect to their utility function. Such changes in the model might be 
justified by the assumption that we are dealing with individuals of 'bounded rationality' 
- one might consider that precise information concerning the racial composition of each 
neighbourhood is not available, for example. Young used techniques from evolutionary 
game theory - an analysis in terms of stochastically stable states - to develop the ffist 
results along these lines [22]. In a paper by Panes and Vriend [16] and in a number of pa- 
pers by Zhang [23, 24, 25], these ideas have been substantially developed, with the results 
generally being that complete global segregation is the only possible long run outcome. 

In [2], Brandt, Immorlica, Kamath and Kleinberg used an analysis of locally defined 
stable configurations, combined with results of Wormald [21], to provide the ffist rigorous 
analysis of the unperturbed one- dimensional Schelling model, for the case t = \- The 
results obtained there are very different to those for the perturbed models: in the final 
configuration the average length of maximal segregated region is independent of n and 
only polynomial in w. So now the local dynamics do not induce global segregation in 
proportion to the size of the society but instead only a small degree of segregation at the 
local level. The suggestion is that these results are in accord with empirical studies of 
residential segregation in large populations [5, 10, 19]. 

In this paper we shall consider what happens more generally for r G [0,1] (for the 
unperturbed model), and we shall observe, in particular, that some remarkable threshold 
behaviour occurs. While some aspects of the approach from [2] remain, in particular 
the focus on locally defined stable configurations which can be used to understand the 
global picture, the specific methods of their proof (the use of 'ffiewall incubators', and 
so on) are entirely specific to the case t = |, and so largely speaking we shall require 
different techniques here. The picture which emerges is one in which one observes different 
behaviour in five regions. For k which is the unique solution in [0, 1] to: 

(which is just slightly less than -\/2/4) these regions are: (i) r < k, (ii) r = k, (iii) k < 
r < |, (iv) T = ^, (v) r > I . In fact we shall not consider the case t = n, but the 
behaviour for all other values of r is given by the theorems below. Perhaps the most 
surprising fact is that, in some cases, increasing r almost certainly leads to decreased 
segregation. The assumption is always that we work with n ^ w, i.e. all results hold for 
all n which are sufficiently large compared to w. A run of length c? is a set of c?-many 



4 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



consecutive nodes all of the same type. Complete segregation refers to any configuration 
in which there exists a single run to which all a nodes belong. 

Theorem 1.1. Suppose t < k and e > 0. For all sufficiently large w, if a node u is 
chosen uniformly at random, then the probability that any node in M{u) is ever involved 
in a swap is < e. Thus there exists a constant d such that, for sufficiently large w, the 
probability u belongs to a run of length > d in the final configuration is < e. 

Theorem 1.2. Suppose r G (k, ^) and e > 0. There exists a constant d such that (for all 
w and n ^ w) the probability that u chosen uniformly at random will belong to a run of 
length > e^l'^ in the final configuration, is greater than 1 — e. 

Theorem 1.3 ([2]). Suppose t =\- There exists a constant c < 1 such that for all A > 0, 
the probability that u chosen uniformly at random will belong to a run of length greater 
than Xw^ in the final configuration, is bounded above by c^. 

Note that, if ^ < r < then the process is identical to that for r = ^, since in both 

cases a node requires w + 1 many nodes of its own type in its neighbourhood in order to 
be happy. 

Theorem 1.4. Suppose that t > ^, and that w is sufficiently large that r > (so 
that the process is not identical to that for r = ^). Then, with probability tending to 1 as 
n oo, the initial configuration is such that complete segregation eventually occurs with 
probability 1. 

We have constructed a program which efficiently simulates the process. The outcomes 
of some simulations are illustrated in Figure 1. In the processes depicted here the number 
of nodes n = 50000, and in the diagrams individuals of type a are coloured light grey and 
individuals of type /3 are coloured black. The inner ring displays the initial mixed config- 
uration (in fact the configuration is sufficiently mixed that changes of type are not really 
visible, so that the inner ring appears dark grey). The outer ring displays the final config- 
uration. Just immediately exterior to the innermost ring are second and third inner rings, 
which display individuals which are unhappy in the initial configuration and individuals 
belonging to 'stable' intervals in the initial configuration respectively (stable intervals will 
be defined subsequently, and in fact, this third ring is empty in these examples except for 
the case r = 0.3). The process by which the final configuration is reached is indicated 
in the space between the inner rings and the outer ring in the following way: when an 
individual changes type this is indicated with a mark, at a distance from the inner rings 
which is proportional to the time at which the change of type takes place. In fact, for the 
case T > I one has to be a little careful in talking about the 'final' configuration - it is 
almost certainly the case that the numbers of each type in the initial configuration mean 
there will always be unhappy individuals of both types, but once a completely segregated 
configuration is reached all future configurations must remain completely segregated. 



SCHELLING SEGREGATION 



5 




We shall also be interested in a variant of the model, which we shall call the simple 
model, and which proceeds in the same way except that at each stage one unhappy node u 
is selected uniformly at random and changes type so long as this does not cause it to have 
less nodes of its own type within M{u). The process ends when no more legal changes are 
possible. One might justify interest in this version of the model in two ways. Firstly there 
are a number of situations in which it is much easier to work with (in one instance here, 
and also in [2], results are proved for the standard model by first considering the simple 
model and then arguing that the proof can be extended to the standard case). The simple 
variant of the model also makes sense as soon as one drops the assumption that we are 
working within a closed system. One might suppose that unhappy individuals living in 
a city will move to a location in the same city if one should be available, but will move 



6 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



elsewhere otherwise, and similarly that individuals will move into the city to fill locations 
becoming vacant. In all cases we shall prove the same results for both models, except for 
the case r > |, where the simple model eventually yields a society of nodes all of one 
type. The rough conjecture is that the simple model accurately describes local behaviour 
for the standard model. 

In Section 2 we shall discuss terminology and we shall make some easy observations 
about how the evolution of the model can be understood. In Section 3 we prove Theorem 
1.1. In section 4 we prove Theorem 1.2. In Section 5 we discuss the case t = |, which 
was already dealt with in [2], and in Section 6 we prove Theorem 1.4. 

2. Some notation, terminology and some easy observations 

In describing the model earlier, we talked in terms of nodes or individuals swapping 
locations at various stages of the dynamic process. To work in this way, however, requires 
one to draw a distinction between individuals and the locations they occupy at each 
stage, and to maintain a bijective map at each stage between these two sets. In fact, it 
is notationally easier to consider a process whereby one simply has a set of n nodes, with 
two unhappy nodes of opposite type selected at each stage (if such exist), which may then 
both change type (when this occurs we shall still refer to the nodes as 'swapping', but 
now they are swapping type rather than location). Thus nodes are identified with indices 
for their locations amongst the set {0, 1, ...,n — 1}, and unless stated otherwise, addition 
and subtraction on these indices are performed modulo n. In the context of discussing 
a node U\, for example, we might refer to the immediate neighbour on the right as node 
Ml + 1. Since we work modulo n it is worth clarifying some details of the interval notation: 
for < 6 < a < ra, we let [a, h] denote the set of nodes ('interval') [a, n — 1]U [0, h] (while 
[6, a] is, of course, understood in the standard way). 

As noted before, for any node u, we let M{u) denote the neighbourhood of m, which is 
the interval [u — w,u + w\. For any set of nodes /, suppose that x is the number of a 
nodes in /, while y = x. Then := x — y and is called the bias of I. By the bias 
of a node we mean the bias of its neighbourhood. Recall that by a run of length m + 1 
we mean an interval [u, u + m] in which all nodes are of the same type. 

We shall be particularly interested in local configurations which are stable, in the sense 
that certain nodes in them can never be caused to change type. Note that if an interval 
of length w + 1 contains at least t(2w + 1) many a nodes, then each of those a nodes 
is happy so long as the others do not change type, meaning that, in fact, no a nodes 
in that interval will ever change type. We say that such an interval of length w + 1 is 
a-stable (and similarly for /3). An interval of length w + 1 is stable if it is either a-stable 
or /3-stable. We shall also make use of a particular kind of stable interval which was used 
in [2]: a firewall is a run of length at least w + 1. We write 'for ^ w ^ n', to mean 'for 
all sufficiently large w and all n sufficiently large compared to w\ 



SCHELLING SEGREGATION 



7 



Arguing that the process ends. We define the harmony index corresponding to 
any given configuration to be the sum over all nodes of the number of their own type 
within their neighbourhood. For r < ^, this harmony index is easily seen to strictly 
increase whenever an unhappy node changes type, which combined with the existence of 
an upper bound n{2w + 1), implies that the process must terminate after finitely many 
stages. For r > ^, we shall argue that with probability tending to 1 as n — )■ oo, the initial 
configuration is such that complete segregation eventually occurs with probability 1. Once 
complete segregation has occurred it is easy to see that all future states must be completely 
segregated, but that 'rotations' can occur, i.e. if the nodes of type a are precisely the 
interval [a,b] at stage s, then at stage s + 1 they must be either [a — 1,6 — 1], [a,b] or 
[a + 1,6+1]. 



3. The case t < k 

The analysis here is identical for the standard and simple models. Of course, we are 
yet to explain how k comes to be defined as described previously - we shall do so shortly. 
The basic idea is that we wish to find the value of r at which stable intervals become 
more likely than unhappy nodes in the initial configuration. For such r, taking w large, 
we shall have that stable intervals are much more likely than unhappy nodes in the initial 
configuration. If u is selected uniformly at random then we shall almost certainly have 
stable intervals of both types on either side of u before any unhappy elements, meaning 
that u never changes type. 

Recall that an interval of length + 1 is (5-stahle if it contains at least {2w + l)r 
many nodes of type /3. In the initial configuration this can be modelled with a binomial 
distribution X ~ h{w + 1, ^), from which we are interested in the probability -Pgtab = 
P(X > {2w + 1)t). Similarly, an a node is unhappy if its neighbourhood contains more 
than {2w + 1)(1 — r) many (3 nodes. We model this as y ~ 6(2w, ^), from which we are 
interested in the probability -Punhap = ^{Y > {2w + 1)(1 — r)). In fact, for now we will 
use the approximations X ~ 6(w, |) where -Pgtab — P(-^ > 2wt), and Y ~ 6(2w, ^) where 
-^unhap = > 2u'(l — r)) (later it will be easy to observe that these approximations 
suffice, and that it does not matter that we have considered the probability that an 
interval of if + 1 is /3-stable, rather than simply the probability that it is stable). We are 
interested in finding conditions on w and r which ensure: 

-^unhap -^unhap 
stab stab 

Now, by the Central Limit Theorem, for large w the binomial distribution approaches 
the normal distribution, but, in fact, this is not enough for our purposes. We need a good 
estimate as to the rate at which this convergence to the normal distribution takes place. 
To this end, we apply a powerful result of McKay based on an estimate of Littlewood 
for the tail of the binomial distribution, see Theorem 2 of [15]. We use the notation 

(j){u) = and Q{x) = (j){u)du. The McKay-Littlewood theorem gives : 



8 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



h-lj 2^-^ <j){^/w{4T - 1))) 
where h = \2wt'] and < < Min j^^^^^, 

Similarly, we get 



_Vw f2w-l\ _±_ Q{V2w{1-2t)) 
^unhap - ^\h'-l)- 22-1 ■ ^(^2^(1 _ 2r)) ' ' 

where h' = [2w(l - r)] and < < Min | ^^^i^^^ , }. Note that for r < | we have 
h < h'. Now what we are really interested in is: 

^unhap ^ \/2 U'- V 0(v/^(4t - 1))) Q(V2^(1 - 2t)) 
^-stab 2- ■ /^«; - 1^ ■ g(v^(4T - 1)) ■ 0(V2^(1 - 2t)) ' 
h-1 



where — E2 — Ei is such that — yf < -E3 < We can bound each of the terms in 
this expression. To start with, we can use the following standard bounds: for any x > 0, 

2m ^ ^^^^ ^ _ 2m 



For some constants Ci and C2 (which may depend on r but not on w), we get: 

[2w-l\ I 

1 \h'-l) (4r - 1) + ^(4r - 1)2 + ^ ^ P^^^^p 

2- {j^l^^ v^(l - 2r) + ^2(1 - 2Tf + J " ^stab 



< C2- - 

2 



2w - 1 

j (4r-l) + J(4r-1)2 + 



4 



10 



^ij) V2(l-2r) + V(l-2r)2+^ 



On the assumption that | < r < |, we can easily bound the fractions involving r and 
absorb these bounds into Ci and C2, giving: 

(3.0.1) Ci • A < <C2-A 

^stab 

where 



SCHELLING SEGREGATION 



9 



A ■-- 



2w-l 
1 \h'-l 



2"" fw-l 
h-l 



To deal with A, we use the following estimates from Stirling's formula: for all positive 
integers n, 

y^^n+ig-n < ^! < en'^+le"". 
Applying these bounds we find that for constants Di and D2'- 

j.<A.2'.''"-^"'''";-^'"''<?'"-'''''"'":'<j.. 

{2w-lf'"--2{h-lf-2{w-h)'"-^+-2 ~ 

Now, using the fact that h = \2wt] and h' = \2w{l — r)] (but ignoring the ceiling issues 
for now) we note that the total power of w on the top and bottom of the fraction is equal 
to 3w — |. Dividing out and absorbing constant terms into Di and D2 we get: 

Dl< A-2'" ■ ^ r . 1 < D2, 

(2 - ^)2'"-i(2r - - 2r)"'(i-2-)+i ~ 

and 

(I _ l)"'-|22"'-2«)r-iQ _ ^\2w-2wT-^M 1 ^2«;(l-T)-|/2^^2^«r+i 

D < A 2"" ^ ' V 2w{i-t)i ^ ' <- J-) 

1- 22"'-i(l - 2^)2^-5 (2r)2"'-i(l - 2^)2"'-l(l -2r)'"(i-2r)+i - 2- 

For X >2, < (l — ^ < 1- Furthermore, the reintroduction of the ceiling function 
introduces no difficulty here, as for < 5 < 1 representing \x\ — a; we also have e^^ < 

(1 + ^-^y~'^ < 1. Thus: 

'2I-2-. (i_r)2(i-)' 
, (l-2r)i-2- , 

Combining 3.0.1 and 3.0.2 and merging constant terms, we get: 



(3.0.3) C,.{i^-^A <^<C2-''' 



10 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 

Now put f{x) = x^^ . Then the value k we 
are looking for is exactly r such that 

f{\-r)=m-r). 

























































































r- 


















































. 1 
















































-A 

















































































79 


















































































> 


























































































































































































-0 


■3- 








































































































































































































































































































































































































































































a 


















n 








0. 




1 


























X 




















i 








1 















2 — K is exactly such 
1' 



In other words x 
that 

f{x) = f(^x + 

Since / has a unique turning point at e~^, 
it follows that k is unique, and numerical 
analysis gives 

K ^ 0.353092313 

(which is just slightly less than v^/4). 



Figure 2. y = x 



2x 



Now from (3.0.3) it is clear that these conclusions are unaffected by our use of the 
approximations X ~ b{w, ^) where -Pgtab ~ 

P(X > 2wt), and Y ~ b{2w, ^) where 



-^unhap = P(^ > 2w{l — r)). Since, for fixed r, the actual probabilities that in the initial 
configuration a given interval of length w + 1 is /3-stable or that a given node is unhappy, 
are to within a fixed multiplicative constant of these approximations, (3.0.3) remains true 
when -Punhap -^stab replaced by these actual probabilities. The analysis above also 
considers a specific interval of length w + 1 to which a node might belong and which might 
be stable, and so does not overtly take account of the fact that a given node belongs to 
w + 1 many intervals of length w + 1 which might be stable. Again, (3.0.3) shows that 
this does not matter - when t < k and k,r > 0, for all sufficiently large w if m is chosen 
uniformly at random then, in the initial configuration, it is more than kw^ times as likely 
that u belongs to a stable interval than that it is an unhappy node. When t > k the 
reverse is true. We need something more than this though. For any u, let Xu be the first 
node to the left of u which, in the initial configuration, is either unhappy or else belongs 
to an interval which is stable. We must show that, for any e' > 0, if ^ w ^ ri (i.e. if w 
is large and n is large compared to w) then the following occurs with probability > 1 — e' 
for u chosen uniformly at random: Xu is defined and no node in [x„ — 2w, Xu] is unhappy 
in the initial configuration. From this we can easily argue that u selected uniformly at 
random will almost certainly have stable intervals of both types on either side in the 
initial configuration before any unhappy nodes, as required. The following lemma (which 
is stated in such a way as to be general purpose, so we can apply it again later) allows us 
to do this: 



Lemma 3.1. Let Pu and Qu he events which only depend on the neighbourhood ofu in the 
initial configuration, meaning that if the neighbourhood of v in the initial configuration is 



SCHELLING SEGREGATION 



11 



identical that of u (i.e. for all i G [~w,w], u + i is of the same type as v + i), then Pu 
holds iff Pv holds and Qu holds iff holds. Suppose also that: 

(i) P{Pu) ^ and P{Qu) ^ 0. 

(ii) For all k, for all sufficiently large w, P{Pu) / P{Qu) > kw. 

For any u, let be the first node to the left^ of u such that either P^ or holds. For 
any e>0, if Q <^ w <^ n then the following occurs with probability > 1 — e for u chosen 
uniformly at random: Xu is defined and for no node v in [xu — ^w, x„] does hold. 
An analogous result holds when 'left' is replaced by 'right'. 

Proof. For a general node u, we'll say it is of type 1 if Pu holds and no node u' G [u—2w., u\ 
satisfies Qu', and of type 2 if Qu holds. Let vr be the probability, for u chosen uniformly 
at random, that Xu is defined and of type 1. Our aim is to show that vr > 1 — e, for all 
-C tf -C n. We define an iteration which assigns colours to nodes as follows. 

Step 0. Pick a node to uniformly at random. 

Step s + 1. Let Vs be the first node to the left of t^, such that Vg = Xt^ or such that s > 
and Vs = to- Carry out the instructions for the first case below which applies: 

(1) If there exists no such Vg then terminate the iteration. 

(2) If Vg = to and s > then make tg undefined and terminate the iteration. 

(3) If there exist any nodes x in [vg — 2w, Vg] such that Q^ holds, then colour tg black, 
otherwise it is of type 1 and we colour it white. Define t^+i = Vg — {2w + 1), unless 
to lies in the interval [vg — {2w + l),Vg), in which case terminate the iteration. 

This completes the description of the iteration. Let S be the maximum value of s for 
which tg is defined and coloured. 

First note that hypothesis (i) in the statement of the lemma guarantees that S* — )■ oo 
as n — )■ cxD. Similarly, we may assume that at least one tg is coloured black. Now let vr' be 
the proportion of the tg which are coloured white, and observe that with high probability 
vr' — vr as n — )■ oo, i.e. for all e' > 0, for all <^ n, we have jvr — 7r'| < e' with probability 
> 1 — e'. In order to see this, consider the situation at the beginning of step s + 1 of the 
iteration, when s > 0. In order to define tg, we moved left until finding and then 

moved a further 2w + 1 nodes to the left. So far, then, nothing that has happened in the 
iteration tells us anything about the neighbourhood of tg and all those nodes to the left 
of tg and strictly to the right of to. So long as S* > s, the way in which tg is coloured 
depends only on these nodes. 

So, we now wish to show that for all sufficiently large w, with probability tending to 1 
as n — 7- oo, we have vr' > 1 — e. Let p be the ratio of type 1 nodes to type 2 nodes (in 
[0,n — 1]), and consider the interval {tg^i,tg]. However tg is coloured, we have at most 
{2w + 1) many type 1 nodes in here, namely some subset of [vg — 2w,Vs]. Similarly, if tg 



By the first node to the left of u satisfying a certain condition we mean the first in the sequence 
u,u — 1, u — 2, • • • which satisfies the condition. 



12 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



is coloured black then we get at least one type 2 node in here. Thus, summing over all 
intervals we find that 

^ ~ \{s < S : tg is black }| 

which is to say 

So if we can show that p ^ 2w + 1, it will follow that vr' is close to 1 as required. Let 
Pi and p2 be the probabilities that uniformly randomly selected u itself is of type 1 and 



2 respectively. Then, by the law of large numbers, we have 



p- 21 

P2 



< e' with probability 



approaching 1 as n — )• oo. Now property (ii) from the statement of the lemma means that 
for each k, for all sufficiently large w, ^ > kw, giving the result. □ 

We now want to use Lemma 3.1 to show: 

(ti) For any e > 0, for <^ w -C n, if u is selected uniformly at random, then with 
probability > 1 — e, there exist U2 < ui < u such that ui and U2 belong to stable 
intervals of opposite types, \ui — > 2w, |-u — mi| > 2w and there are no unhappy 
nodes in the interval [u2 , u\ . 

(Of course a similar result will then hold to the right of u.) Choose e' ^ e. For any f , 
let P„ be the event that v belongs to a stable interval in the initial configuration, let Q,, 
be the event that v is an unhappy node in the initial configuration and let be the first 
node u to the left of v such that either P„ or Qu holds. By Lemma 3.1 and (3.0.3), since 
r < K we may let w be large enough (and n large enough compared to w, as always) such 
that the following occurs with probability > 1 — e' for u chosen uniformly at random: Xu 
is defined and for no node v in [xu — 2w, Xu] does hold. Also, let e" < e' be such that 
1 — e" is exactly the probability, for u chosen uniformly at random, that Xu is defined and 
for no node v in [x^ — 2w, Xu] does Qy hold. We inductively define a sequence of nodes, 
much as in the proof of Lemma 3.1. 

Step 0. Pick a node u uniformly at random. If Xu is undefined, or m G [xu — 2w,Xu], or 
if any node z in [xu — 2w,Xu] satisfies Q^, then declare that the iteration has terminated 
unsuccessfully. Otherwise, let y = Xu — {2w + 1) and consider Xy. If Xy is undefined, 
or ti G [xy — 2w,Xy\, or if any node z in [xy — 2w, Xy] satisfies Q^, then declare that the 
iteration has terminated unsuccessfully. Otherwise let 7 be the type of a stable interval 
containing Xy, define ui = Xy and to = Xy — 2w + 1. 

Step s + 1. Let v be the first node to the left of ts, such that v = Xt^ or such that v = u. 
Carry out the instructions for the first case below which applies: 

(1) If M G [v — 2w,v] then declare that the iteration has terminated unsuccessfully. 

(2) If there exist any nodes z in [v — 2w,v] such that Qz holds, then declare that 
the iteration has terminated unsuccessfully. Otherwise, if v belongs to a stable 
interval which is not of type 7 then define U2 = v and declare that the iteration 



SCHELLING SEGREGATION 



13 



has terminated successfully. If the previous cases do not apply then define tg+i — 
V — {2w + 1) and proceed to the next step of the iteration. 

Note first that if the iteration terminates successfully then Ui and U2 exist as required. 
Let us say that the iteration ends prematurely if either of or Xy undefined at step 
0, or if M e [xu — 2w, Xu] U [xy — 2w, Xy], or else there exists i > such that the iteration 
has not terminated strictly prior to step i and case (1) applies at step i (so, roughly, the 
iteration ends prematurely if it runs out of nodes before finding either an unhappy node 
or both ui and M2). Letting ( be the probability that the iteration ends prematurely, note 
that C — > as n — > 00 (for fixed w) . Then (for e" as specified above) the probability that 
the iteration terminates unsuccessfully is less than: 

C + 6' + (1 - e'y + (1 - e'^e' + (1 - ey^e' + (1 - e'O^^e' + • • • 

So for sufficiently large n, the probability the iteration terminates unsuccessfully is less 
than 4e', estabhshing (fi). 

Definition 3.2. For 7 e {a, (3}, let 7* be the opposite type, i.e. j* — a if j = (3, and 
J* — P otherwise. 

To complete the argument, we now show that the existence of stable intervals of opposite 
types on both sides before any unhappy nodes, suffices to prevent type changes, i.e. that 
(ti) and its mirror image give the desired result. So suppose that Uq satisfies the property 
that, in the initial configuration, there exist m_2 < M-i < Uq < Ui < U2 such that: 

• For i G [—2, 1], l^j+i — Ui\ > 2w. 

• For i G [—2, 2] — {0}, Ui belongs to a 7i-stable interval, say. Also 7_2 = 7* 1 and 
72 = 7i*- 

• There are no unhappy nodes in the interval [w_2,W2]- 

For i G [—2, 2] — {0}, let /j = [oj, bi]. Then it follows directly by induction on stages, that 
nodes in the interval [a_2, C-i) can only change from 7_i to 7_2, that nodes in the interval 
[a_i,6i] cannot change type, and that nodes in the interval (61,62] can only change from 
7i to 72. 

4. The case k < t < | 

We work first with the simple model, and then supply the necessary modifications for 
the standard case. In what follows we shall work with some fixed r in the interval (k, |), 
some fixed e > 0, and we shall assume that n is large compared to w. We want to show 
that there exists a constant d such that for all sufficiently large w the probability that a 
randomly chosen element will belong to a run of length > e'^^^ is greater than 1 — e. Of 
course, proving the result for all sufficiently large w suffices to give the result for all w 
since one can simply adjust the choice of d to deal with finitely many small values, but 
we shall make frequent use of the fact that we need only work for all sufficiently large w 



14 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



in what follows and so stating the theorem in this way is instructive. 

Our entire analysis takes place relative to a node Uq which is chosen uniformly at ran- 
dom. In order to specify how the type changing process can be expected to develop in 
the vicinity^ of Uq, we shall detail a finite number of events Qi, such as a lack of unhappy 
nodes in the neighbourhood of Uq in the initial configuration, for example. Our aim is to 
show that, for any e' > 0, the probability that all the Qi occur is greater than 1 — e' for 
all sufficiently large w (and for n sufficiently large compared to w). Suppose that this is 
established for the finite set of events Qi with i G 11 (for some finite set 11). To establish 
the result when a new event Qj with j ^ 11 is included, it then suffices to prove for each 
e' > and all sufficiently large w, that the probability of Qj given P is greater than 1 — e', 
where P is any conjunction (possibly empty) of the Qi such that z G 11. Of course, we 
choose that P which is most convenient to work with. 

Recall that for any set of nodes /, 0(/) is the bias of / and that by the bias of a node 
we mean the bias of its neighbourhood. 

The initial configuration. Let us begin by considering what can be expected from the 
initial configuration in the vicinity of uq. In general, if independent random 

variables with P(xj = 1) = P{xi = —1) = | when I < i < k, then letting X = Yli=i^i 
Hoeffding's inequality gives, for arbitrary A > : 



Now we use this to bound the probability that a node u has bias in the initial configuration 
which will cause it to be unhappy, should u be of the minority type in its neighbourhood. 
So, we wish to bound the probability that the number of a nodes in M{u) is > (1 — 
t)(2w + 1) or the number of /3 nodes in N'{u) is > (1 — t){2w + 1). This corresponds to 
a bias e{U{u)) of > (1 - 2r)(2w + 1) or < -(1 - 2t){2w + 1). 

Definition 4.1. When 0(7V(m)) > (1 - 2t){2w + 1) or 0(A/'(m)) < -(1 - 2r)(2w + 1) 
we say that u has high bias, denoted Hb(-u). // this holds for the initial configuration, we 
say that Hb*(-u) holds. 

Definition 4.2. For the remainder of this section we define d = (JzItp ■ 

Lemma 4.3 (Likely happiness). Let u be a node chosen uniformly at random. For any 
e' > and for all sufficiently large w, the probability that Hb*('u) holds is < e'e~'^^^. 

Proof. Putting Xy/2w + 1 = (1 — 2r)(2w + 1) in Hoeffding's inequality above, we get 



To be clear, the term 'vicinity' of uq is used informally here, to mean, roughly, an interval containing 
Uq, which may be large compared to w but which is small compared to n. 



P(|X| > Av^) < 2e 



A2/2 



AV2 = (1 - 2r) 



2 (2^^ + 1) 
2 



SCHELLING SEGREGATION 



15 



SO 



2w + l (l-2r)2' 

We chose d > (Y^il^p-, which means that for any e' > and for all sufficiently large w, the 
probability that u has high bias in the initial configuration is < e'e""'/'^, as required. □ 

Defining the nodes Zj and rj. For now, we fix some ko > (we shall choose a specific 
value of ko which is appropriate later) and for 1 < i < k^ we define a node Zj to the left of 
Mo and also a node to the right. We let li be the first node v to the left of Uq such that 
Hb*(t>) holds, so long as this node is in the interval [uq — |n, uq] (otherwise li is undefined). 
Then, given li for i < ko we let /j+i be the first node v to the left of U — {2w + 1) such 
that Hb*(f) holds, so long as no nodes in the interval [/j+i,/,] are outside the interval 
[uq — \n,uo] (otherwise /j+i is undefined). We let ri be the first node v to the right of Uq 
such that Hb*(i;) holds, so long as this node is in the interval [uo^Uo + \n\. Given for 
i < ko we let r-j+i be the first node v to the right of rj + {2w + 1) such that Wd*{v) holds, 
so long as no nodes in the interval [rj,rj+i] are outside the interval [uo,Uo + ^n]. 

The reason for considering the intervals [uo — \n,uo] and [uo,uo + \n] in the above, is 
that we wish to be able to move left from uo to IkQ without meeting any of the nodes r^. 



Figure 3. Picking out nodes of high bias in the vicinity of Uq. 



Definition 4.4 (Defining Qo)- Let d be as in Definition It is notationally convenient 
to let lo = Uo = To- We let Qo be the event that for 1 < i < ko: 
(i) li and Ti are both defined, and; 
(a) \li - > e"'/"' and jr, - rj_i| > e"'/"'. 

Note that for any fixed w, as n — )■ oo the probability that any li or is undefined (for 
"i ^ ^o) goes to 0. By Lemma 4.3, and since the probability that any node in an interval 
/ has high bias in the initial configuration is at most Sug/P(Hb*('u)), for any e' > and 
for any fixed fco > 1 we can ensure that P{Qo) > 1 — e' by taking w sufficiently large (and 
by taking n sufficiently large compared to w). Thus, for ^ w ^ n, the picture we are 
presented with is almost certainly as in Figure 3. 

So far we have defined an event, Qq, which ensures that for 1 < i < ko, each /j and each 
Tj is defined and that these nodes are at very large distances from each other for large w. 



16 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



The way in which we defined each /j and rj means that no node in the interval [ho^fko] 
can be unhappy in the initial configuration unless it is in one of the intervals [U — 2w, k] 
or [rj,rj + 2w]. We are yet to choose k^. 

Building the informal picture. Recall that, by a run of length m + 1 we mean an 
interval [u, u + m] in which all nodes are of the same type, and that a firewall is a run of 
length at least w + 1. 

Let us now consider informally what can be expected to happen in the neighbourhood 
of k. Suppose that is initially of type (3 and is unhappy in the initial configuration. 
Then with high probability for sufficiently large w, there will not be any unhappy nodes 
of type a in the neighbourhood of k in the initial configuration. If k changes type, then 
this will make the bias in its neighbourhood still more positive, which may cause further 
nodes of type (3 to become unhappy. If these change to type a then this will further 
increase the bias, potentially causing more nodes to become unhappy, and so on. The 
following definitions formalise some of the ways in which this process might play out, and 
in particular the possibility that this process might play out without interference from 
what happens in other neighbourhoods Af{lj) or N'{rj). 

Definition 4.5. For < i < ko we say that li completes at stage s if both: 

(1) No node in N'{li) is unhappy at stage s, and this is not true for any s' < s. 

(2) There exist xq and xi with /j+i + 2w < xq < U — 2w < U + 2w < xi < U^i — 2w, 
such that by the end of stage s, no node in [xq — w,xo] or [xi,xi +w] has changed 
type. 

We say that li completes if it completes at some stage. We also define completion for r, 
analogously. 

Definition 4.6. We say that U (or Vi) originates a firewall if it completes at some 
stage s and belongs to firewall at that stage.^ 

The informal idea, is that we now wish to show that each /j and each has some 
reasonable chance of originating a firewall. Then we can choose k^ so that the probability 
none of the U originate a firewall or none of the originate a firewall is ^ e, i.e. with high 
probability firewalls will originate either side of uq within the interval [ZfcQ,rfcQ]. Then, 
letting ii be the least i such that U originates a firewall, and letting 12 be the least i 
such that Ti originates a firewall, we wish to show that with high probability the firewalls 
originated at and will spread until uq is contained in one of them. Since these two 
firewalls have originated at nodes which are at distance at least e^/'^ apart, Uq ultimately 
belongs to a firewall of at least this length. 

■^This definition might initially seem to neglect the possibility, for example, that li completes at some 
stage and does not belong to a firewall at that stage, but that, nevertheless, the sequence of type changes 
in its neighbourhood and surrounding neighbourhoods which have led to completion have caused the 
creation of a firewall. In fact we shall be able to ensure (Lemma 4.25) that with high probability such 
events do not occur. 



SCHELLING SEGREGATION 



17 



In order to make this basic picture work, however, we need to be clear about the way 
in which firewalls will spread. If the interval [u — w, u] is a firewall of type a, then when 
M + 1 is of type (3, it cannot be happy unless the interval [u + l,u + w + 1] is /3-stable. 
So firewalls will spread until they hit stable intervals of the opposite type. Now suppose 
that, with ?'i and ^2 as above, ii = ^2 = 2 and, for now, suppose that a-firewalls are 
originated at both I2 and r2. In order to show that these two firewalls will spread until 
they meet each other, it will be helpful first of all, to be able to assume that in the initial 
configuration there are no stable subintervals of [/*;(,, rfco]. This will follow quite easily for 
large w, from our previous analysis of the ratio between the probability of unhappy nodes 
and stable intervals. A further danger that we have to be able to avoid, however, is that, 
while li and ri do not originate firewalls, they do get as far as creating /3-stable intervals. 

Definition 4.7. Given i with < i < ko, let u = Li or u = r^. Let Ui = u — {2w + 1) and 

U2 = u + {2w + 1). We say that u subsides if it completes at some stage s, and: 

• There are no nodes in [ui,U2\ belonging to stable intervals at stage s; 

• No nodes inH{ui) orJ\f{u2) have changed type by the end of stage s. 

So we need to be able to show, in fact, that with high probability each and either 
originates a firewall or subsides. To do this clearly involves analysing what is likely to 
happen in each of the neighbourhoods N'{li) and N'{ri). First of all, the large distances 
between these nodes mean that, for fixed and sufficiently large -u;, we can expect all of 
the li and (for < i < fco) to complete, so that one can understand the early stages of 
the process for each of these neighbourhoods by considering each in isolation. For each 
of these neighbourhoods we then wish to show that a certain dichotomy holds: either a 
small number of type changes will occur before completion, or else a large number of type 
changes will occur before completion and a firewall will be created. Now, if we strengthen 
our original requirement that there are no stable subintervals of [IkoT^^'ko] in the initial 
configuration, to a requirement that there are no subintervals which are 'close' to being 
stable in the initial configuration (where 'close' is to be made precise in such a way as to 
ensure that when a small number of type changes occur in the neighbourhood of before 
completion, these are not enough to create any stable intervals), then we shall have that 
with high probability each l^ and each either originates a firewall or else completes 
without creating stable intervals. 

Once all this is in place, there is then one further hurdle. In the above, we assumed 
that the firewalls originating at I2 and r2 are both a-firewalls. If they are firewalls of op- 
posite type, however, we still have some work to do in order to prove that uq will almost 
certainly end up belonging to one of these two firewalls. 

Formalising the intuitive picture. We now wish to define further events which ensure 
that the neighbourhoods of each and are nicely behaved in the initial configuration, 
in order that we can establish the dichotomy described above, regarding the number of 
changes that can be expected in the neighbourhood of each li and prior to completion. 



18 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



The distributions for these intervals in the initial configuration, however, are a little 
difficult to attack directly because of the nature of their definition. In choosing li we move 
left until we find the first node which has high bias - this gives an asymmetry to the given 
information concerning the neighbourhood. Roughly, we might expect something like a 
hypergeometric distribution, but how good is this as an approximation? What we shall 
do, in fact, is first of all to understand what can be expected from the neighbourhood of 
a node which is chosen uniformly at random from among those with borderline bias: 

Definition 4.8. Let us say that a node u has borderline bias, denoted Bb('u), if: 

\Q{M{u))\ = Mm{e G 2N + 1 : > (1 - 2t){2w + 1)}, 

i.e., u has high bias hut decreasing the modulus of the bias by the minimum possible amount 
of 2 would cause it not to have high bias. We say that Bh*{u) holds if u has borderline 
bias in the initial configuration. 

Note that each of the nodes and rj for < z < ko has borderline bias. 

In what follows it is often convenient to work with some fixed k > 1 and to divide an 
interval / = [a, b] into k parts of equal length. This occasions the minor inconvenience 
that the length of the interval might not be a multiple of k, motivating the following 
definition: 

Definition 4.9. Let I = [a,b] and suppose k>l. We define the subintervals /(I : k) : = 
[a,a+[^\] := [/(I : k),,I{l : k)^] and 



I{j : k) 
for 2 < j < k. 



a + 



k 



+ l,a + 



jjb-a) 
k 



[nj:k)„Iij:k)2] 



In Definition 4.9 the intervals are counted from left to right, but it is also useful some- 
times to work from right to left: 

Definition 4.10. Let I = [a, b] and suppose k > 1. For 1 < j < k we define I{j : k)~ = 
+ k), I{j : k)-^ =I{k-j + l: k), and I{j : k)^ = I{k - j + I : k)^. 

Lemma 4.11 (Smoothness Lemma). Suppose u is such that the proportion of a nodes in 
I := Af{u) is 9, and that u is selected uniformly at random from nodes with this property. 
Then for any fixed k > 1 and e' > 0, for all sufficiently large w the following holds with 
probability > 1 — e': for every j with 1 < j < k, the proportion of the nodes in I{j : k) 
which are a nodes, lies in the interval [6 — e',6 + e']. 

Proof. Once we are given that 6 is the proportion of the nodes in / which are of type 
a, the nodes in this interval cease to be independently distributed, and the distribution 
becomes hypergeometric. Since we consider fixed k and e' and take w large, it suffices 
to prove the result for given j with 1 < j < k, i.e. if Pi, are events, each of which 
occurs with probability tending to 1 as u; — )■ oo, then their conjunction also occurs with 



SCHELLING SEGREGATION 



19 



probability tending to 1 as — )■ oo. For a given j, the result follows directly, however, 
from an application of Chebyshev's inequality and standard results for the mean and 
variance of a hypergeometric distribution. Let x be the number of a nodes in the interval 
I{j : k) and let i be the length of the interval, so that \£ — {2w + l)/k\ < 1. Then we 
have: 

P{\x/e -e\> e') < rh'-^ Var(x) = o{i)r\ 

□ 

Lemma 4.11 basically tells us that if we choose a node u with borderline bias uniformly 
at random, then for large w we can expect the bias to move towards fairly smoothly 
as we move to u + {2w + 1) or -u — {2w + 1). In order to see roughly why this is true, 
suppose that \Q{Af{u))\ = p and let 9 be the proportion of the nodes in Af{u) which are of 
type a. Let / = [u,u + {2w + 1)] and, for some fc, consider the sequence of evenly spaced 
nodes Vj = A;) 2. Now in forming the neighbourhood of vj, we lose an interval of length 
(almost exactly) \_j{2w + l)/k\ from the neighbourhood of u, which by Lemma 4.11 we 
can expect to have a proportion of a nodes very close to 6. We also gain an interval of 
the same length from outside Af{u), which we can expect to have a proportion of a nodes 
very close to ^. This means a bias for vj close to p-^- 

The following definition will allow us to express this more formally: 

Definition 4.12. Suppose that Eh*{u) holds. Let Ii = [u — (2w + 1),^] and I2 = [u,u + 
{2w + 1)]. Let \<d{Af{u))\ = p and let 9 he the proportion of the nodes in M{u) which are 
of type a. Suppose that k > 1 is even and e' > 0. For 1 < j < k let Vj = l2{j : k)2 and 
let v^j = : A;)]". We say that Smooth^ holds if both: 

• For every j with 1 < \j\ < k, |6(7V(fj)) — p^^\/w < e' . 

• For every j with 1 < j < k/2 the proportion of the nodes in : k)~ which are 
of type a lies in the interval [9 — e', 6* + e'\, and similarly for l2{j, k). 

We say that Smooth^ holds if Sniooth.k ,,/{u) holds in the initial configuration. Figure 
4 illustrates the picture for k = 2. 



bias 




Figure 4. Smooth bias change for k = 2. 



20 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



Corollary 4.13 (Smoothness Corollary). Suppose we are given that Bb*(M) holds. For 
all k > 1 and e' > 0, and for all sufficiently large w, Smooth^ ^/(-u) holds with probability 
>l-e'. 

Proof. Let U-i = u — {2w + 1) and ui = u + {2w + 1). The fact that u has borderline 
bias has no impact on the distributions for Afiu-i) and J\f{ui). Let /s = Niu-i) and 
/4 = Miui). By the law of large numbers, for large w we can expect the proportion of a 
nodes in each J3(j : k) and : k) to be close to |. The result then follows from Lemma 
4.1L □ 

While Smoothness Corollary 4.13 tells us what can be expected from the neighbour- 
hood of a node chosen uniformly at random from among those with borderline bias, this 
does not immediately allow us to infer anything about what can be expected from the 
neighbourhoods of each li and rj. What we need is that, if we choose a node u uniformly at 
random and then move left (or right) until we find a first node v with high bias, then with 
high probability Smooth^ ^/(v) holds. Ideally, we would like to be able to apply Lemma 
3.1, but that would involve establishing an appropriate form of condition (ii) from the 
statement of the lemma. In order to work around this condition, it turns out that what 
we need is a bound on the number of nodes of borderline bias which can be expected in 
the neighbourhood of u such that Wo*{u) holds. The following lemma is a step in this 
direction: 

Definition 4.14. Given k > 1, letjVk{u) be the interval [u—\w/k'],u+ Iw/k]]. B'b*{u,k, z) 
holds if'Bh*{u) holds and there are at most z many nodes with borderline bias in Mkiu) 
in the initial configuration. 

Lemma 4.15. Suppose we are given that 'Sb*{u) holds. For any e' > there exists z such 
that for k w, Bb*(u, k, z) holds with probability > 1 — e' . 

Proof. We consider the case that u has positive bias p, and let 9 be the proportion of 
the nodes in Af{u) which are of type a (so p = {29 — l){2w + 1)). The case that u has 
negative bias is almost identical. 

First of all, we want to show that for sufficiently large z, if we step \_z/2\ many nodes 
to the left (or right) of u, then we will very probably have bias which is well below p. The 
argument here is very similar to the proof of Corollary 4. 13 - in forming the neighbourhood 
of V = u — [z/2\ we lose [z/2\ many nodes from J\f{u), with the proportion of a nodes 
being close to 9, and we gain the same number of new nodes, with the proportion of a 
nodes here being close to ^ . Arguing more precisely, from amongst the nodes that we lose 
from M{u), the expected number of a nodes, xq say, is 9[z/2\. By applying Chebyshev's 
Inequality just as in the proof of Lemma 4.11, we conclude that for any e" > and for 
all sufficiently large z, P(|(xo/[-2/2j) — 9\ > e") <^ e' . Now consider xi, the number of a 
nodes in Af{v)\Af{u). The law of large numbers tells us that for any e" > and for all 
sufficiently large z, P{\{xi/[z/2\) — \\ > e") ^ e'- Combining these facts gives that for 
any m > and for all sufficiently large z: 



SCHELLING SEGREGATION 



21 



P{p-Q{Af{v)) < m) < e'. 

So far then, we have considered moving \_z/2\ many nodes to the left of u, and have 
concluded that the bias at this node v will very probably be well below p (a similar 
argument also applies, of course, moving to the right). Now we have to show that as we 
move left from v, so long as we remain within A4(m) the bias will very probably remain 
below p. In order to do this, we approximate the bias as we move to the left, by a biased 
random walk. Recall that for a biased random walk starting at value —m < —1, with 
probability p > ^ of going down at each step and probability 1 — p of going up, the 
probability of ever hitting is: 




So let us briefly adopt the approximation that nodes in M{u) are i.i.d. random variables, 
each of which has probability 9 of being of type a. Then, as we move left one position 
from a location in Mkiu) to the left of m, the probability that the bias increases by 2 is 
|(1 — ^), the probability that the bias remains the same is |, and the probability that the 
bias decreases by 2 is \6. Removing those steps at which the bias does not change, we 
get a biased random walk with probability 9 of going down at each step and probability 
1 — of going up. Choose 9' with ^ < 9' < 9. Now, dropping the false assumption of 
independence, by taking k sufficiently large we ensure that as we take successive steps left 
from V inside the interval J\fk{u), at each step, no matter what has occurred at previous 
steps, the probability of the bias increasing is less than ^(1 — 9'), the probability of the 
bias remaining the same is |, and the probability of the bias decreasing is greater than 
^9'. Thus, if the bias at u— \_z/2\ is < p — m, then the probability that any nodes in the 
interval [u — {w/kl^u — [z/2\) have high bias is less than {^^r-)'^- 

Finally, let m be such that (^^r-)™ <^ e', and let z be sufficiently large that, for 
V = u- [z/2\,P{p-Q{Af{v)) <m) <^e'. □ 

The reader might observe that the proof of Lemma 4.15 actually establishes something 
stronger, and perhaps also more natural, than claimed. Suppose we are given that Bh*{u) 
holds. The proof suffices to show that for any e' > there exists z such that, for ^ 
k <^ w, the probability there are more than z nodes of high bias in Afk{u) is less than e'. 
One might reasonably wonder why we did not define Bh*{u,k, z) to reflect this stronger 
condition - that there are at most z many nodes of high bias (rather than borderline 
bias) in A4(m)- The reason is that in later counting arguments we shall need failure of the 
condition (in speciflc circumstances) to guarantee that there are, in fact, at least z many 
nodes of borderline bias in the relevant neighbourhood for which the condition fails. 

Of course. Lemma 4.15, only restricts the number of nodes of borderline bias that will 
normally occur in the interval Mkiu). We have to be able to deal with a larger interval: 



22 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



Definition 4.16. We say that Bh*{u,z) holds if'Bh*{u) holds and there are at most z 
many nodes of borderline bias in the interval [u — {2w + !),« + {2w + 1)] in the initial 
configuration. 

Now note that if k2 is sufficiently large compared to ki, if e' is sufficiently small, and 
if Bb*(M, ki,z) and Smooth^^ ^,{u) both hold, then in the initial configuration there are at 
most z many nodes of borderline bias in the interval [u — {2w + l),u+ {2w + 1)] . Applying 
Lemmas 4.15 and 4.13 we therefore have: 

Corollary 4.17 (Few nodes of borderline bias). Suppose we are given that B'h*{u) holds. 
For any e' > 0, z may be chosen so that for all sufficiently large w, Bh* {u,z) holds with 
probability > 1 — e'. 

We are now finally ready to prove that, with high probability when <^ w <^ n, the 
neighbourhoods of each /j and will satisfy the required smoothness conditions. We 
will later be able to use this in order to prove that each li and rj very probably either 
originates a ffiewall or else subsides. 

Lemma 4.18 (Smoothness for li and r^). For any node u, let x„ be the first node to the 
left of u which has high bias in the initial configuration. For any e' > and k > 1, if 
-C w ^ n and u is chosen uniformly at random, then Xu is defined and Smooth* (x„, k, e') 
holds with probability > 1 — e'. 

An analogous result holds when 'left' is replaced by 'right'. 

Proof. Applying Corollary 4.17, choose z such that, for sufficiently large w, if we are 
given that v has borderline bias, then Bh*{v,z) fails to hold with probability <^ e', i.e. 
putting ei = P(-iBb*(T;, z)|Bb*(f )), choose z so that ei <^ e' for sufficiently large w. Let 
€2 = P(-iSmooth^ ^,(t')|Bb*(v)). Then Corollary 4.13 gives that for all sufficiently large w, 
£2 ^ e'/z - just apply the statement of the corollary to k and e" <^ e' /z. 
Now define 

P(Smooth^^^,(i;)|Bb*(i;,z)) 
^ P(^Smooth* ^,(t;)|Bb*(t;, z)) ' 

Then 

P(Smooth*^^,(?;) ABb*(t;,z)|Bb*(t;)) i - - z 
^ ~ P(-.Smooth*^^,(t;) ABb*(t;, ^)|Bb*(t;))) - ea e'' 

Thus for sufficiently large w, with probability approaching 1 as n — t- oo, the ratio amongst 
the nodes v such that Bb*(f, z) holds, between the number such that Smooth^ ^, (w) holds 
and the number such that it does not, is much greater than z/e'. 

Consider the initial configuration. We define an iteration which assigns colours to nodes 
as follows. 

Step 0. Pick a node to uniformly at random. 



SCHELLING SEGREGATION 



23 



Step s + 1. Let Vs be the first node v to the left of ts such that v = Xt^ or such that s > 
and V = to. Carry out the instructions for the first case below which applies (-i denotes 
negation) : 

(1) If there exists no such v then terminate the iteration, and declare that it has 
'ended prematurely'. 

(2) If Vs = to and s > then make tg undefined and terminate the iteration. 

(3) If \vs — ts\ < 2w + 1 then colour tg pink. 

(4) If Bh*{vs, z) and Smooth^ e'('^^s) both occur, then give tg the colours white and blue. 

(5) If Bb*(f5,z) and -iSmooth^ ^,(t>s) both occur, then give tg the colours white and 
red. 

(6) If Vg satisfies -iBb*(fs, 2;), then give tg the colour black. 

In cases (3)-(6), define t^+i = fg — (2w+l), unless to lies in the interval [f^ — (2w+l), w^), 
in which case terminate the iteration. 

This completes the description of the iteration. 

First note that the probability that the iteration terminates prematurely can be made 
arbitrarily small by taking n large, and similarly that we may assume there are tg of all 
colours. Now let S be the greatest s such that tg is defined when the iteration terminates, 
and let vr be the proportion of the tg, s < S, such that tg is coloured black. Amongst 
the nodes u which have borderline bias, let p be the ratio between the number for which 
Bh* {u,z) holds and the number for which it does not - so that for large n, p can be 
expected to be close to (1 — ei)/ei. In order to find an upper bound for p, first let us 
find an upper bound for the number of nodes u such that Bb* {u, z) holds. Let Vg be as 
defined in the iteration. However we colour tg, there can be at most z many nodes u in 
h '■= {vg — {2w + 1), fs] which satisfy Bb*(M, z), giving an upper bound of {S + l)z. Now, 
let us find a lower bound for the number of nodes with borderline bias for which Bb* (u, z) 
fails. If tg is coloured pink, then we are not guaranteed any nodes in Ig of borderline bias 
for which Bb*(M,z) fails. If tg is coloured white, then the same applies. If tg is coloured 
black, however, we are guaranteed at least z many nodes of borderline bias in Ig for which 
Bb*(M, z) fails. We therefore get a lower bound of {S + 1)tiz. Thus: 

^(^ + 1) _ \ 
~ 7r(S' + l)z 7r 

We chose z previously, so that for all sufficiently large w, ei ^ e'. Since p is close to 
(1 — ei)/ei for large n, we infer that for all sufficiently large w, with probability tending to 
1 as -^^ 00, we have p^ 1/e', so that tt <C e'. For large n, the probability tg is coloured 
pink is less than {2w + l)e~'^/'^ (with d as in Definition 4.2). So, for sufficiently large w, 
with probability tending to 1 as n — )■ 00, the proportion of the tg which are not coloured 
white is ^ e'. 



24 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



Now let P be the proportion of the ts which are coloured red. Amongst the nodes 
such that Bh* {u,z) holds, let 7 be the ratio between the number for which Smooth^ ^/(-u) 
holds, and the number for which it does not - so that 7 can be expected to be close to /i 
(as defined earlier) for large n. Again, let us form an upper bound for 7. However tg is 
coloured, we get at most z many nodes in the interval Ig for which Bb*(M, z) A Smooth^ ^,{u) 
holds. If tg is coloured red then we get at least one node in the interval Ig for which 
Bb*(M, z) A -.Smooth^, holds. Thus: 



We previously observed that, for all sufficiently large w, fi ^ z/e'. Thus, for all sufficiently 
large w, with probability tending to 1 as n — cxd, we have that 7 ^ z/e', so that (3 ^ e'. 
Since we have already established that, for w n, the probability tg is not coloured 
white is <^ e', we conclude that for ^ w <^ n the probability tg is coloured blue is > 1— e'. 
Now for large n, the probability that tg is coloured blue is less than the probability, for 
u chosen uniformly at random, that Vu is defined and Bh*{vu,z) A Smooth^ ^,(t>„) holds, 
which concludes the proof. □ 

So Lemma 4.18 tells us that, for some fixed ki > 1 and e' > (to be chosen later), we 
will be able to define one of the Qj (in fact this will be Q2) to be the event that, when 
1 < i < ko and u = U 01 u = Vi, Smooth^^ ^,{u) holds. This event will then occur with 
probability tending to 1 as w — )■ 00. 

Shortly we shall go on to analyse what can be expected to occur in the interval [Ikg, r^g] 
once the process of type changing begins. First of all, however, we have to establish one 
more thing for the initial configuration. Although we are working for fixed r, in order to 
be able to talk about intervals which are 'close' to being stable, it is also useful to be able 
to talk about intervals which are stable with respect to r' < r: 

Definition 4.19. We say that an interval of length w + 1 is t' -stable if it contains 
t'(2w + 1) many a-nodes or t'{2w + 1) many (5 nodes. 

Lemma 4.20 (Nothing close to stable in the vicinity of Uq). Suppose k, < t' < r. For 

any u, let Xu be the first node to the left of u which, in the initial configuration, either has 
high bias or else belongs to an interval which is t' -stable. For any e' > 0, if w n 
then the following occurs with probability > 1 — e' for u chosen uniformly at random: Xu 
is defined and no node in [xy, — 2w,Xy\ belongs to a t' -stable interval. 
An analogous result holds when 'left' is replaced by 'right'. 

Proof. For a moment, consider working with r' rather than r. Recall that, in Section 3 
we showed that for constants Ci and C2: 



7< 



z{S + l) _ z_ 
/3(5 + l) ~J 




SCHELLING SEGREGATION 



25 



Thus, for any k and for sufficiently large w, the probability that a randomly chosen node 
is unhappy is more than kw times the probability that a randomly chosen node belongs 
to a r'-stable interval. Now, since r > r', this only makes unhappy nodes more likely 
in the initial configuration. Thus, working with r, for any k and for all sufficiently large 
w, the probability that a randomly chosen node is unhappy is more than kw times the 
probability that a randomly chosen node belongs to a r'-stable interval. The result then 
follows from Lemma 3.1. □ 

Although we have already established a lot concerning what can be expected from the 
initial configuration, so far we have only gotten as far as precisely defining the first event 
Qq. Here are our second and third events: 

Definition 4.21 (Defining Qi). Once and for all, fix some tq with k, < tq < t. Let Qi he 

the event that li and rj are defined for all 1 < i < k^, and there do not exist any r^-stahle 
suhintervals of [4o'''"fco] initial configuration. 

Note that, by Lemma 4.20, when kQ <^ w the probability that Qi does not occur is 
< e. 

Definition 4.22 (Defining Q2)- Let tq he as in Definition 4-21. Once and for all, choose 
ki such that r- <^ t — tq, and choose k2 and eo such that ki <^ k2 <^ — and k2 is a multiple 
of ki. We define Q2 to he the event that all the k and are defined for 1 < i < kg, and 
that when u = U or u = Vi, Smooth^^ eo(^) holds. 

By Lemma 4.18, when k^ <^w the probability that Q2 does not occur is ^ e. 

The process to completion. Having established a clearer picture of what can be ex- 
pected from the initial configuration, we now look to understand what will happen in the 
early stages, in the neighbourhood of each U or r^. First of all, we must establish that 
these nodes can be expected to complete. 

Lemma 4.23 (Zj and complete). For any e' > 0, if '^w n then for all i G [1, fco), 
li and Ti will (he defined and will) complete with prohahility > 1 — e' . 

Proof. We form an upper bound for the probability that U will fail to complete (the proof 
for Ti is essentially identical). Suppose that Qq holds, and that w is large. As before, it is 
convenient to define Zq = Wq. Fix i < ko and let Ji = [/j+i,/^] and I2 = Let k be 

the greatest such that, when 1 < j < k, : k) and l2{j : k) are of length > w + 1. For 
1 < j < [k/2\ define: 

J, := /i(j : k) U /i(j : k)' U hU : k) U hU : A;)". 
Figure 5 below shows, as an example, what J2 looks like: 

For 1 < j < [k/2\, let Pj be the event that a node in Jj changes type, and note that 
P3 cannot occur until P2 has occurred, P4 cannot occur until P3 has occurred, and so 



26 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



h+i 



U-i 



Figure 5. J2 = h{2 : k) U h{2 : k)- U 12(2 : k) U 12(2 : A;)" 



on. Now the basic idea is that if completion fails to occur, then the sequence of events 
P21 P\k/2\ niust occur before any stage at which there are no unhappy nodes in the 
neighbourhood of U. 

We label certain stages as being a 'step towards completion', and certain others as being 
a 'step towards failure of completion' (while some stages are labelled as neither). These 
labels do not fully represent all aspects of the process, but contain enough information 
for our purposes. 

Steps towards completion. So long as there are unhappy nodes in the neighbourhood 
of /j, we label any stage at which a node in this neighbourhood swaps type as a step 
towards completion. Subsequent to any stage at which li completes, we label every stage 
as a step towards completion. 

Steps towards failure of completion. If 1 < j < \_k/'2,\ is the greatest such that Pj 
has occurred prior to stage s or no Pj has occurred and J = 1, and if Pj+ 1 occurs at stage 
s, then we label s a step towards failure of completion. 

Now at any stage s at which some Pj for j < [fc/2j is yet to occur, and at which there 
are unhappy nodes in the neighbourhood of /», the probability of s being a step towards 
failure of completion, is at most 4(w + 1) times the probability of it being a step towards 
completion (since there are at most A{w + 1) times as many nodes which, if chosen to 
swap, will cause a step towards failure of completion, as those which will cause a step 
towards completion). Choosing d' > d we get that, since Qq holds, for all sufficiently 
large w, [k/2\ > e'"!'^ . We may therefore consider the first e^' many stages which are 
steps either towards completion or failure of completion^ and, for large w, consider the 
probability that at most 2w of these are steps towards completion. By the law of large 
numbers, this probability tends to as if — )■ cxd. Now if there do not exist unhappy nodes 
of both types in the interval [/« — 2w, in the initial configuration (which is the case 
with probability tending to 1 as w — 00, since it holds if Smooth* (/j. A;', e") holds for large 
fc' and small e"), then 2w + 1 many steps towards completion prior to P\kii\ occurring, 
suffices to ensure completion for /j. □ 



Definition 4.24 (Defining Qs). We define Q^, 
1 < i < kg) are defined and complete. 



to be the event that all the li and rj (for 



^In order to ensure the existence of e™/'* many such stages it is momentarily convenient to adopt 
the convention that the process continues after there are no unhappy nodes m € but with nothing 

occurring at such stages, and that such stages are also labelled steps towards completion. 



SCHELLING SEGREGATION 



27 



Lemma 4.25 (The required dichotomy). Suppose that w is large and that Qj holds for 
all j < 3. Then for i < k^, U and Vi will each either subside or originate a firewall. 

Proof. We prove the result for and the proof for is essentially identical. 

Note first that the choice of ki in Definition 4.22 means, in particular, that lOw/ki 
type changes in any given neighbourhood cannot create stable intervals (given that Qi 
holds). Note also, that satisfaction of Q2 suffices to ensure that there are not unhappy 
nodes of both types in the neighbourhood of /j in the initial configuration. Now suppose 
that li completes at stage s and has positive bias in the initial configuration (the case for 
negative bias is essentially identical). 

It is useful at this point to establish names for a number of relevant intervals. We let 
ui = U — {2w + 1) and U2 = li + {2w + 1). Then we define: 

• I = [Ui,U2]. 

• h = [Uiji], I2 = [li,U2]. 

• K} = h{t:ki)-Ul2{j:ki). 

• K] = h{i:k2rvM2{j:k2). 

In Definition 4.22 we assumed that k2 is a multiple of /ci, so we may let m be such that 
^2 = mki. It is convenient to assume that k2 is even. Now we divide into two cases. 

li subsides. First of all, suppose that at stage s, there is a /S-node u in the interval K\. 
Then u must be happy at stage s. The fact that Q2 is satisfied, together with the fact 
that li completes at stage s, means that prior to stage s, the only type changes in the 
interval J are from type (3 to type a, so that u must be happy at every stage < s. Now, 
since ki <^ k2 <^ and Smooth^2,eo(^«) holds, any nodes in / — {K\ U K]) have lower bias 
than M, and hence are happy, in the initial configuration. It then follows by induction on 
the stages < s that no node in J — {K\ U Kl) changes type prior to stage s. In order to 
see this suppose that it holds prior to stage s' < s. Then at stage s', if f G / — {K\ U K])^ 
it still has lower bias than u and so cannot change type from /3 to a, since u is happy so v 
must be. If f G J — /, then v changing type would contradict the fact that U completes. 

We therefore get at most \{K\ U -^'2)! < lOw/fci many type changes in the interval J 
prior to completion. As observed above, this means that no stable intervals are created 
and li subsides, as required. 

li originates a firewall. So suppose instead that, at stage s, all nodes in the interval K\ 
are of type a. Given m as above, another way of putting this, is that all nodes in |Jj<m ^"j 
are of type a at stage s. We now show by induction on r > m that, when r < ^2/2, 
any nodes in K'^ must be of type a at stage s - i.e. that all nodes in N{li) are a nodes 
at stage s. So suppose that m < r < /C2/2 and that the hypothesis holds for all r' < r. 
Consider u G K^^^. Let p be the bias of li in the initial configuration. First let us form a 
lower bound for the bias of u in the initial configuration. The fact that Q2 holds means 
that the leftmost and rightmost nodes in have bias at least p — yP~ ^qW- Then, since 



28 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



the bias can change by at most 2 if we move left or right one node, we conclude that in 
the initial configuration u has bias 

fr 2(2w + l] 

Pi> p- —p + eoW + 



Now we have to take into account all of the (3 nodes in |Jj<r which have changed 
type. In fact, so that we can be sure that each change of type affects the bias of u, we 
shall consider just those which lie between /j and u. Let 6 be the proportion of the nodes 
in A/'(/j) which are a nodes - so that p = {26 — l){2w + 1). Then the number of /3 nodes 
in UjXr initial configuration, which lie between and u, is at least: 

(l-^-eo)(2w; + l)r/A;2. 
Each change of type for one of these nodes means an increase of 2 in the bias of u, so that 
at stage s, u has bias: 

/ r 2(2w + 1)\ 

P2>p-i^^P + eow+ ^ M + (1 - ^ - eo)2(2u; + l)r/k2. 



p+ (2u; + 1) 



[ {l-0)2r _ epw _ 2_ _ ^ _ eo2r\ 
V k2 2w + l k2 k2^ ' k2 ) 



We are left to compare the terms 



k2 ' 2w + l' A;2' A;2^~" ^' k2 
Since l/k2^ cq, the second term is much smaller than the third. Since (1— G (0.5, 0.65), 
r//c2 > 1/^1 and ki <^ k2, the first term is much larger than the third. Since eo is small, 
the first term is also much larger than the last. The result then follows for large w, since 
2 — 2^ is always more than double 26' — 1 for k < t < ^, meaning that the first term is 
more than double the fourth term, and thus p2 > p, meaning that if m is a /3 node, it will 
be unhappy. □ 

Lemma 4.26 (Reasonable chance of firewall). There exists 6 > which does not depend 
on w, such that if I < i < kp, then k originates a firewall with probability > 6 (and 
similarly for Vi). 

Proof. Let 6 be the proportion of the nodes in M{li) which are of type a. Suppose that 
U has positive bias p. Just as we did in the proof of Lemma 4.15, let us initially adopt the 
approximation that the nodes in M{li) are i.i.d. with probability 9 of being an a node. Let 
K\ be as defined in the proof of Lemma 4.25. From the argument given there, it suffices 
to show that there exists 5 > which does not depend on w, such that the probability of 
li completing with all nodes in K\ being of type a, is greater than 5. 
We consider a process, consisting of £ := \^'\ + 1 many steps. 



SCHELLING SEGREGATION 



29 



Step 0. If li is of type /3 then define po = P + 2, and otherwise define po = p. 

Step s > (with s < i). Consider the two nodes k — s and k + s. Let Xg be the number 
of these which are /3 nodes (so Xg G {0, 1,2}). Let ys be 0*(A/'(/i - s)) - 6*(A/'(/j - s + 1)). 
Then define ps = Ps-i + 2xs + ?/s. 

So, given the approximation that the nodes in N'{li) are i.i.d. with probabihty 9 of 
being an a node, this gives a biased random walk. If > P for all s then, given that U 
completes, all the nodes to the left of U in K\ must be a nodes at completion. We may 
also consider the mirror image process, in which Us is defined in terms of the bias at /j + s 
and li + s — 1 instead of the bias at /j — s and /j — s + 1. Suppose that this process gives 
a set of values p'^. Again, if p'^ > p for all s then, given that U completes, all the nodes 
to the right of U in K\ must be a nodes at completion. It therefore suffices to show that 
ps — Ps-i is more likely to be positive than negative, for < s < i. 

The probability that ps — ps-i = —2 is ^9^. The probability that ps — Ps-i = 

is 19 ■ 29{1 -9) + \9'^ = 9\1 - 9) + \9'^. Therefore the probability that p, - p,_i G 
{2,4,6} = l-ie3_^2(i_^)_ 1^2 >i_ 

Finally, we drop the false assumption of independence, in the same way as in the proof 
of Lemma 4.15. In the random walk described above, let p > 1 be the probability that 
Ps — Ps-i > 0. Now choose p' with ^ < p' < p. For sufficiently large ki, when we drop 
the assumption of independence, the actual probability that ps — Ps-i > at each step is 
greater than p', no matter what has occurred at previous steps. □ 

Definition 4.27 (Defining and choosing ko). We let be the event that one of the 
li i < ko is defined an originates a firewall, and that the same holds for some rj, j < k^. 
According to Lemma 4-26, given e > we can choose ko once and for all, which is large 
enough such that the probability does not occur is e for <^ w n. 

So far then, we have proved that, for any e' > 0, for ^ w ^ n, with probability 
> 1 — e' there will be a least i such that /j is defined and originates a firewall, there will 
be a least j such that rj is defined and originates a firewall, and both li and Vj will be at 
a distance > e^^'^ from uq- It remains to prove that the firewalls originated at k and 
will spread until uq is contained in one of them. The following lemma therefore completes 
the proof for the simple model. 

Lemma 4.28 {uq ultimately joins a firewall). Suppose that all Qi hold for i < 4. Let i 

be the least such that li is defined and originates a firewall, and let j be least such that 
Tj is defined and originates a firewall. For any e' > 0, for all sufficiently large w, with 
probability > 1 — e' , uq will eventually be contained in one of the two firewalls originated 
at li and 

Proof. If the firewalls originated at li and rj are of the same type, then the result is 
immediate. So suppose that the firewall originated at /j is of type a and the firewall 
originated at rj is of type (3. 

As we describe the argument we initially state two facts (fi) and (t2) without proof. 
Once the outline of the argument is complete, we then provide proofs for these facts. 



30 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



First, note that there are certain type changes within the interval \li-,Tj\ which we are not 
presently concerned with. If u is in the neighbourhood of some x which is Ui for 1 < i' < i 
or Vji for 1 < j' < j, and changes type at a stage which is less than or equal to that at 
which X completes, then we say that this change of type is previous. Ignoring changes 
which are previous, it is then easy to formalise the first stage s{I) at which either of the 
two firewalls originated at /j or rj 'have influence' on any given subinterval I of [/j,rj]: 
s{I) is the first stage at which any node in / has a change of type which is not previous. 
The fact that all Qk for A; < 4 are satisfied, means that s{I) must be defined for any 
subinterval / of [ii,rj]. 

For 7 e {a, /3} we also define s'^{I) to be the first stage at which a node in / has a change 
of type to 7 which is not previous - note that, unlike s{I), these values may be undefined 
(we write s'^{I) I to indicate that s'^{I) is defined). Now let Ui = Uq — {2w + 1) and let 
U2^Uo + {2w + 1). Let Iq = A/'(mo), h = jV(mi) and let h = M{u2). 

(ti) For any e' > 0, for all sufficiently large tu, the probability that there does not exist 
7 e such that both s^{Ii) |= s(/i) and s^{l2) i= -s(/2), is <C e' . 

So suppose that there does exist such 7, and suppose ^ — fi (the case 7 = a is similar). 

(12) Let V e Tj] be the rightmost node such that either v — liOr s"{J\f{v)) l< s{Ii). 
For any e' > 0, for all sufficiently large w, the probability that |iio — t^l < e"'/'^ is 
<e'. 

Let us suppose that \uo — v\> e^^'^. Now consider /q, the neighbourhood of Uq. Let I3 be 
the rightmost w many nodes in this neighbourhood, and let be the leftmost w many 
nodes. Let 6 = 1 — 2r, so that if any node x has bias in the initial configuration then wS 
many type changes in J\f{x), all to type (3 suffice to give x almost exactly borderline bias 
(less than borderline bias plus 3 to be more precise). By the Central Limit Theorem, for 
any e' > 0, for all sufficiently large w, the probability that any node in Iq or Ji has bias in 
the initial configuration which is of modulus > w^6 is ^ e'. Suppose that this is not the 
case. Now no node in I4 can become unhappy until there have been > ^Sw many changes 
to (3 type in 73 (with room to spare). Similarly no node in Ii can become unhappy until 
there have been at least ^Sw many changes to /3 type in I4. So at stage s(/i) we conclude 
that there have been at least ^Sw many changes to (3 type in the neighbourhood of uq. 
This means that all nodes in Iq now have bias at most w^S — w^6. Thus all a nodes in 
Iq must be unhappy at stage s(/i). We can now define a notion of completion for uq. We 
say that uq completes at stage s > s(/i) if: 

(1) No node in Iq is unhappy at stage s, and this is not true for any s' < s with 
s' > s{h). 

(2) Letting v be defined as in (12), there exists Xq with v + 2w < Xq < Uq — 2w, such 
that by the end of stage s, no node in [xq — w, Xq] has had a change of type which 
is not previous. 



SCHELLING SEGREGATION 



31 



We can then argue, precisely as in Lemma 4.23, that mq almost certainly completes. In 
this case it then follows that uq ultimately belongs to a firewall, which includes the interval 
[Mo,rj]. To complete the proof, we are therefore left to verify (fi) and (t2). 

We verify (fi) in such a way that almost precisely the same proof suffices to verify (t2) 
also. Given e' > 0, choose k ^ ^. For all sufficiently large w, it follows from our earlier 
analysis that, in fact, the probability that |mo— ^i| < {k+l)\e'"^'^] or |mo— ri| < {k+l)\e'^^'^] 
is ^ e'. So suppose that neither of these possibilities hold. Now, for each instance Z 
of the process on the circle of n nodes for which there does not exist 7 e {c(,P} such 
that both s'^{Ii) 1= and s'^(/2) i= s{l2), we wish to show that there are fc-many 

distinct others which we can label as corresponding to Z, each of which occurs with the 
same probability as Z and for which there does exist 7 of the kind described (and for 
which all Qn for n < A hold for the same Uq). We also require that, for distinct Z and Z' 
for which 7 as required does not exist, the two corresponding sets of processes have no 
intersection. The fc-many distinct processes corresponding to Z are easily defined. Given 
that 7 does not exist, we must have that = s"(Ji) and that s(/2) = s^ih)- The 

processes corresponding to Z are the 'rotations' of the entire process Z by n\e^'^^ many 
nodes to the right, for 1 < n < k. We therefore conclude that the probability that 7 does 
not exist as required, is < ^ ^ e'. 

As remarked above, an almost identical proof then suffices to establish {]2)- D 

The standard model. The difficulty that arises when one moves to the standard model, 
is that when there are different numbers of unhappy a and /3 nodes, it is no longer true 
that every unhappy node is equally likely to be chosen as part of a swapping pair. If there 
are more unhappy a nodes than unhappy /3 nodes at a given stage, for example, then 
unhappy (3 nodes belong to more unhappy pairs of opposite type than do their unhappy 
a counterparts, and so are more likely to be chosen as part of an unhappy pair. This 
potentially complicates our proof of Lemma 4.23, that each U and will almost certainly 
complete, for example, since it may make 'steps towards completion' (as defined in that 
proof) less likely. The solution, just as in [2], is to use technology developed by Wormald 
[21] in order to show that we can sufficiently accurately model the discrete process with 
a continuous one which is governed by a system of differential equations, and thereby 
demonstrate that the number of unhappy nodes of each type actually remains very evenly 
balanced. The remainder of this section is very similar to the corresponding argument in 
[2]. We give the full proof since there are some minor differences, and to make the paper 
as self-contained as possible. We aim to prove the following fact, which shows that our 
proof for the simple model suffices for the standard model as well: 

(o) Suppose w, T (with k < r < |) and e > are fixed (where e is the value we have 
fixed throughout this section, and which played a role in the definition of ko). Let 
Iko and rfcg be as defined previously. For any e' > 0, when n is sufficiently large the 
following holds with probability > 1 — e': there exists a first stage at which there 
are no unhappy nodes in the interval [ho^^ko] and at all stages up to this the total 



32 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



number of unhappy a nodes divided by the total number of unhappy /9 nodes hes 
in the interval [1 — e', 1 + e']. 

The very basic idea is as follows. For the remainder of the section we suppose that w, r 
and e are fixed. With high probability, for large n we will have in the initial configuration 
that the proportion of nodes which are unhappy and of type a is roughly equal to the 
proportion of nodes which are unhappy and of type /3. Briefly, however, let us make the 
simplification that these proportions will be exactly equal. We then want to show that 
the process can be sufficiently accurately modelled, for large n, by a system of differential 
equations, which are entirely symmetric in a and /3. This symmetry means that the 
solution to the system of differential equations must describe an evolution in which there 
are always precisely equal numbers of unhappy a and /3 nodes. Of course, we then have to 
deal with the fact that the numbers of unhappy a and /3 nodes in the initial configuration 
need not actually be exactly equal, but this turns out not to present too many problems. 

As discussed in [2], there are, however, some further complications which arise immedi- 
ately as one looks to apply the Wormald machinery. The method applies to a process in 
which the state of the system at any given moment in time is described by an ^-dimensional 
vector of real numbers, where I remains fixed, and we then look to approximate the dis- 
crete process by a continuous one as n — i- oo. How can we describe the configuration at 
any given stage by an ^-dimensional vector, where I is independent from n? Let C„ be the 
graph which is a cycle of size n. Up until now, then, we have been considering processes 
unfolding on each From now on we consider also a value L which depends on w, but 
not on n. Generally we shall work under the assumption that w <^ L <^ n. For the sake 
of simplicity we assume that L divides n (but everything that follows is easily modified to 
deal with the possibility that this is not the case). As we consider the process unfolding 
on Cni we consider also a parallel process on G„, which is a disjoint union of cycles of 
length L. More precisely, nodes u and v are connected in Gn iff Yu/L\ = \y/L\ and 
u = V ±1 mod L. In order to consider a parallel process on G„, it is also convenient to 
modify the way in which we count the stages of the process. In the process as previously 
described, an unhappy pair of nodes of opposite type are selected at each stage, which 
may then swap (in fact will swap for the values of r considered here). Since we shall now 
have a situation in which the same node u may be unhappy in C„ but happy in Gn, or 
vice versa, it becomes convenient to consider a process in which two nodes are selected 
uniformly at random at each stage, which will only then swap if they are of opposite type 
and both are unhappy. Of course this makes no real difference to the evolution of the 
system, except for the way in which we count the stages. The parallel process on G„ then 
unfolds as follows: when u and v are selected for a potential swap in Gn, they swap in 
Gn if they are both of opposite type and are unhappy in G„. Now let us see how the 
configuration of Gn at any stage can be described by a 2^-dimensional vector. We let 2^ 
denote the set of binary strings of length L. For each node u and each stage s we define 
a string G 2^: r„ s(0) = 1 if m is of type a at stage s, otherwise Tu^si^) = 0, and 
then Tu_s(l) = 1 if the node to the right of u in Gn is of type a at stage s, and so on. 



SCHELLING SEGREGATION 



33 



For each a G 2^ we define Ca{s) to be the number of nodes u such that r„,s = a. Then 
the 2^-dimensional vector <^(s) in which the components are the values Ccr{s) for a G 2^ 
describes the configuration at stage s up to isomorphism. 

Our first task now is to show that for large L and large n compared to L, the processes on 
Gn and C„ do not diverge too quickly. To this end we inductively define a set of tainted 
nodes for each stage s, denoted T{s). These are nodes whose neighbourhoods might 
possibly look different in Gn and C„. T(0) is the set of u such that —w < u mod L < w 
(we assume that the swapping process begins at stage s = 1). If u and v are chosen for a 
potential swap at stage s > and are both untainted, then T{s) = T{s — 1). Otherwise 
T(s) is the union of T{s — 1) with the set of all nodes which are in the neighbourhood 
of M or f in either Gn or C„. Immediately it is clear that |T(s) — T(s — 1)| < 4(2w + 1). 
In fact, by counting more carefully those nodes which belong to the neighbourhoods of u 
and V in both graphs, we get \T{s) — T|(s — 1)| < 2{3w + 1), so that, since w is large: 



(4.28.1) \T{s) -T{s <7w. 

The next lemma provides precisely the kind of probabilistic bound on the number of 
tainted nodes at any given stage which we will need later. 

Lemma 4.29. The following conditions hold at every stage s > 0: 

(1) The expected number of tainted nodes at stage s is bounded above by e^^*"''/"- (^^^^) n. 

(2) The probability that M > 2ei4"'«/" (2^) ^t most e-W(2iL2), 

Proof. Let p{s) = \T{s)\. In order to prove (1) first, let u and v be the nodes chosen for 
a swap at stage s + 1. Let 7 be the probability that either of these nodes are tainted (at 
the end of stage s). The probability that u is tainted is p{s)/n, and similarly for v, so 
7 < 2p(s)/n. Now if neither of m or f is tainted then p{s + 1) = p{s), and otherwise, by 
(4.28.1), p{s + 1) < p{s) + 7w. We therefore have: 

o(s) [ \Alw\ 
E[p(s + 1) I p{s)\ < (1 - 7)p(s) + 7(p(s) + 7^) < p{s) + r-^lw = p{s) ■ 1 + . 

For X > 0, 1 + a; < e^, giving: 

E[p(. + 1) I p{s)] < Pis) ■ e^^-/". 

So far, then, we have established that the sequence of random variables Yg = p(s)e~^^"''^/" 
is a supermartingale. This suffices to give (1) since Yq = ( ^"^^ ) n. 

In order to establish (2) we make use of the following fact, as proved in [21] for example. 
If Xi, is a supermartingale with Xq = and \Xi — | < q for all i and constants 
Cj, then for all > 0, 



34 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



(4.29.1) P(X, >n)< exp 



If we put Xs = ^ — ^^j^ then the sequence Xo,Xi, ... is a supermartingle and Xq = 0. 
In order to apply (4.29.1) we need a bound on the values |Xj — Xj_i|. To this end we 
consider the sum: 

\Ys - e"^^'"/"!",! + - e-^^'"/"m. 

Now 

Y, - e-i^"'/"r, = y;(i - e-^-'"'/") < y;i4w/n, 

since for x > 0, 1 — < x. Also, 

Ys+i - e"^^"'/"n = e~^^(^+i)'"/"(p(s + 1) - p(s)) < 7M;e-^^(^+^)"'/", 
by (4.28.1). Thus, 



We may therefore put = ?LLiue-^^'^^/"'_ Now since for < x < 1, j^!^ = 1 + x + x'^ + x^ ■ ■ ■ 
we conclude that J^'^o'^l — (21^/^)^(1 — e-28u)/n^-i ^ 21t(;/?T,, the latter inequality 
following from the fact that for small positive x, 1 — > 3x/4. 
Finally, applying (4.29.1), we get: 



/2'u;-l\ /2w-l 
P ( X, + ( 1 > 2 



P X, > 



L J \ L 

2w-l 



< 



exp 



L 

n{2w - 1)2 



42wL2 
/ nw \ 

< ^"H-2iz^j- 



This establishes (2) as required. □ 

So Lemma 4.29 tells us that for fixed x, we can ensure with high probability that the 
proportion of nodes which are tainted by stage xn is very small, so long as we take L large 
and n large compared to L. With this in place, we now concentrate for a while on the 



SCHELLING SEGREGATION 



35 



processes on the graphs Gn- In order to approximate these processes by the solution to a 
system of differential equations, we first of all have to draw up reasonable candidates for 
the differential equations to be used. To this end we begin by considering the conditional 
expectations E[(^o-(s + 1) | Cl'^)]- One can form an expression for this expectation, simply 
by listing all of the different ways in which the number of nodes u with = a can increase 
or decrease at a given stage and establishing their probabilities (we formally defined 
rather than r„, but shall omit s where no ambiguity results). Consider a proposed swap 
between u and v which belong to different cycles, such that Tu = cr' and = cr". Define 
a(cr, cr', cr") to be the net change resulting from this proposed swap, in the number of nodes 
u' for which Tu' = cr. Note that the net change will be zero unless u and v are of opposite 
type and are both unhappy, and that in any case \a{cr,a',a")\ < 2L. Then: 

(4.29.2) E[C.(. + 1)-C.(^) I C(^)]= 5^a(a,a',a")^^^^^%^ + O 

The O (^^^ correction term accounts for the possibility that the nodes selected for a 

potential swap might belong to the same cycle, the probability of this occurring being 
L/n and the net change then being bounded by L. We next perform some scaling, so 
that the variables reach fixed functions in the limit as n — )• oo: for each a we want a real 
function z„{s) to model the behaviour of ^(a{sn). Now (4.29.2) suggests the following 
system of differential equations for the functions z^r'. 

(4.29.3) z'^is) = J2 aia,a',a")z,is)z^.{s). 

a', a" 

(4.29.4) z.(0) = -C.(0). 

n 

Note that for any solution to (4.29.3) and (4.29.4), if we put y{s) = J2a^o-{s), then: 
y'i^) = XI = 5Z ^ai(^,(^',(^")za'{s)z^4s) =^0 = 0. 

cr a',a" a o-',cr" 

Thus, since ?/(0) = 1, we also have y{s) = 1 for all s in the domain. This in turn means 
that : 

(4.29.5) 

WAs)\ = |5^5^a(cr,a',or")2;,,(s)2;,"(s)| < | 2L2;,,(s)2;,.(s)| = | 2Lz,,(s)| = 2L 

cr' a" cr' a" cr' 

Suppose for now that we are interested in the first xn steps of the process on G„. Later 
we shall choose a value for x which suits our needs. For now though, this means that 
we are interested in solutions to (4.29.3) and (4.29.4) for s G [0,x]. Let 11 be the set of 




36 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



points {s,zi, ...,Z2l) such that s G [0,x] and, for each i < 2-^, \zi\ < 2Lx + 1. Let D be 
a bounded connected open subset of '^'^ containing 11 and such that the minimum i°° 
distance between the boundary of D and any point in 11 is bounded below (and where 
the i°° distance between two vectors is the maximum difference between corresponding 
components). For the remainder of this section, let all the strings in 2^ be enumerated 
as cTi, (72, <72L. Then by standard results in the theory of differential equations (see [14] 
for example), the fact that each of the functions 



satisfies a Lipschitz condition for each argument Zr in D, together with (4.29.5), means 
that there exists a unique set of functions {z^Xs) \ CTi G 2^} defined on the interval [0,x] 
and which satisfy (4.29.3) and (4.29.4) for all s G io,x]. 

The following lemma is just a slightly stripped down version of Theorem 5.1 from [21]. 

Lemma 4.30. Let D and x be as above. Suppose that for all n, X{n) > L'^/n and that 
as n ^ CO, \{n) — i- 0. Consider the unique set of functions {z^ \ cr G 2^} defined on 
the interval [0,x] which satisfy (4-29.4) and (4.29.3) for all s G [0,x]. Then the following 



uniformly for < s < xn. 

Actually, in the statement Theorem 5.1 in [21], the conclusion is only guaranteed for those 
s for which {s,Zfj^{s)^ ...,Zcj^^{s)) is within distance C\{n) of the boundary of D for 
some sufficiently large constant C (where C is independent from n). In our case, however, 
(4.29.5) together with the way in which we defined D removes these complications. Since 
we required that the minimum distance between D and 11 be bounded below, and since 
A(n) — )■ as n — )■ oo, the required condition is automatically satisfied for all sufficiently 
large n and all s G [0,x] (and then the constant in the 0( ^^^^-^^ exp(— "g^-* )) term can be 
modified to ensure the theorem holds for all n). 

Consider now the symmetry properties which must be satisfied by the functions z^. 
For each a G 2^ let a be the string which results from changing every 1 in a to 0, and 
changing every to 1. For 2 G , let l{z) be the vector which results from swapping 
the iih component with the jth component whenever = crj (recalling our enumeration 
of the strings of length 2^ from before). If we have t(<^(0)) = <^(0) (which we probably 
will not, but for large n we shall have something close to it with high probability), then 
the symmetry properties of (4.29.3) guarantee that for all s G [0,x], i.{C{s)) = C{s). 
Define u{a) = 1 if any node u such that r„ = a is an unhappy a node, u{a) = — 1 
if any node u such that r„ = a is an unhappy /3 node, and u{(7) = otherwise. For 
z = {zi, Z2l) G consider the linear function: 




holds with probability 1 — 0( 




C<x(s) = nza{s/n) + 0{n\{n)) 



SCHELLING SEGREGATION 



37 



A(z) = '^u{ai)zi. 

i 

Then A(<^(s)/n) is the difference at stage s, between the fraction of nodes which are un- 
happy and of type a, and the fraction which are unhappy and of type (3. Let z{s) be the 
vector solution to (4.29.3) and (4.29.4), i.e. with zth component z^j^^s). If i(<^(0)) = <^(0) 
then we have that A(z(s)) = for all s G [0, x\. 

In the above we considered the case that i(<^(0)) = C(0)- We now have to deal with 
the fact that this will probably not hold exactly. Recall that we are interested in the 
processes on C„ and Gn up stage xn for some fixed x, which we are yet to specify, but 
which will not depend on n or L. For now suppose that we want to ensure the following 
modified version of our ultimate aim (o) (one of the differences being that this modified 
statement refers to a difference in numbers rather than a ratio): 

(t) For fixed x > and any ei > the following holds for the process on C^, with 
probability > 1 — ei for all sufficiently large n: at all stages < xn the difference 
between the fraction of nodes which are unhappy and of type a and the fraction 
which are unhappy and of type /3, is less than ei. 

To establish (f) take e2 < ei. It suffices to ensure that with probability > 1 — e2, 
|A(z(s))| < €2 for all s G [0,x], since then we can take L large and n sufficiently large 
compared to L and apply Lemmas 4.29 and 4.30 (putting \{n) = max{^^^,n~^^^}, for 
example) to get the required result for C„. 

Let S denote the set of vectors y = {yi, ...,y2L) such that the solution to (4.29.3) with 
^(0) = y, satisfies |A(z(s))| < £2 for all s G [0,x]. Let T, = {y \ i{y) = y^i^yi > 
^y^iVi = By the continuity properties of ordinary differential equations (see for 
example [3]) it follows that S is an open set, so since S is compact there exists do such 
that every point within distance do of S belongs to S. The it follows directly from the 
law of large numbers that when n is sufficiently large compared to L, <^(0)/n is within 
distance do of S with probability > 1 — €2- This establishes (f) as required. 

Finally, we have to specify the value x. Recall that our aim is to prove (o), as specified 
previously. Let h be the length of the interval [Ikg, r^o]. Note that (for fixed w and e, where 
e is the value for which we are proving Theorem 1.2, and which plays a role in the value of 
ko, and for e' as in (o)), we can take I2 such that for all sufficiently large n, the probability 
that either \lko ~ uq\ > I2/2 01 \uq — rko\ > Z2/2 is ^ e'. Taking Z3 ^ /2/e', consider now 
the first stage Si at which, for some 7 G {q;,/3}, the number of unhappy 7 nodes as a 
proportion of n is less than I//3. Taking e" <^ e' and putting x > (2i(7 + l)(/3)^/e", suppose 
towards a contradiction that P(si > xn) > e". In that case, with probability > e", the 
following holds at each stage 1 < s < xn: given the configuration at the end of stage 
s — 1, the probability at stage s of choosing two nodes which do swap is at least jj^- The 
expected number of stages < xn at which swaps do occur is therefore > {2w + l)n, which 



38 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



gives the required contradiction since, according to the observation made in Section 2, 
there can be at most {2w + l)n stages at which a swap occurs. Applying (f) to ei ^ e'/l^ 
then estabhshes that with probabihty approaching 1 as n — )■ oo, si < xn and at all stages 
< xn the difference between the fraction of nodes which are unhappy and of type a and 
the fraction which are unhappy and of type (3, is less than ei. When the latter conditions 
hold, this means that at all stages up to si the ratio between the number of unhappy a 
nodes and the number of unhappy /3 nodes lies in the interval [1 — e', 1 + e']. Now choosing 
^3 ^ h/^' as we did previously, means that, since Mq was chosen uniformly at random, and 
given that the total proportion of nodes which are unhappy at stage si is at most ^^j^, 
the probability that there are any unhappy nodes in the interval [uq — Ihf^] , Uq + Ih/^]] 
(or in the almost certainly smaller interval [hoy^ko]) ^.t stage Si is -C e'. Thus we have 
estabhshed (o), as required. 

5. The case r = | 

This case was dealt with in [2], for both the simple and standard models. Although we 
have talked about threshold behaviour occurring at both r = n and r = |, it is worth 
noting that these two thresholds are of essentially different types. For a fixed large w, as 
one increases the value of r past the lower threshold /t, one will very suddenly observe 
entirely different results for the final configuration - from a situation in which almost no 
type changing occurs, one moves to a situation where u chosen uniformly at random can 
be expected to belong to a firewall of length exponential in w in the final configuration. As 
one increases r gradually up to ^, however, what one observes is a smooth decrease in the 
expected length of the firewall to which u belongs, with the situation only reversing very 
suddenly when t > \. In order to observe the difference in behaviour for when r is just 
less than \ or when t = ^, one should consider instead these two fixed r, and gradually 
increase w. Then, when r < |, the expected length of the firewall to which u belongs 
grows exponentially, while for r = ^ it grows only polynomially in w. Figure 6 displays 
this difference in behaviour, and also illustrates the distinct mechanisms involved for the 
two cases. These diagrams illustrate the process exactly as in Figure 1, the difference 
being that now n = 1000000 rather than 50000. It is also interesting to observe the 
difference in the way that the firewall spreads as a 'wave' emanating from unhappy nodes 
in the initial configuration for different t < ^. When r is just above k the wave is sharply 
defined, (as illustrated in Figure 1), much less so when r is just below ^. 

6. The case r > | 

We consider first the standard model. Note that if ^ < t < then the process is 
identical to that for t = \. We therefore assume in what follows that w is sufficiently 
large to ensure r > which is equivalent to the condition that adjacent nodes of 

opposite types cannot both be happy. We show that with probability 1 — e the starting 
configuration is such that complete segregation results with probability 1, where e — )■ as 



SCHELLING SEGREGATION 



39 




w = 3000, r = 0.48 w = 3000, r = 0.50 



Figure 6. The second kind of threshold behaviour. 

n — )■ oo. Recall that complete segregation refers to any configuration in which all a nodes 
belong to a single run, and that, as observed in Section 2, once a completely segregated 
configuration is reached all future configurations must be completely segregated. 

Let X be the number of a nodes in the initial configuration, and let Xp = x/n. Our 
task is to show that for sufficiently large n it is possible to reach a completely segregated 
configuration from any other, so long close to ^. To prove this, however, one must 

be able to ensure the existence of unhappy individuals of both types at each step along 
the way. To this end we first of all prove that, with probability tending to 1 as n — )■ oo, 
X satisfies the property that there are unhappy a nodes in any configuration on a ring of 
size n with x many a nodes. While this may sound intuitively rather obvious, it (perhaps 
surprisingly) takes a little work to prove. 



40 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



Proving that (when Xp is close to |) there are unhappy a nodes in any config- 
uration. Rather than considering the nodes to be arranged in a circle, it is temporarily 
useful to suppose instead that they are arranged in a line which extends infinitely far to 
the right, with positions indexed by natural numbers (starting with 0). So we consider a 
fixed configuration in which each natural number is either of type a or /3. It is convenient 
to suppose further that the node at w is of type a, and that all a nodes > w are happy. 
From these assumptions, we will deduce the following bound on the proportion of a nodes: 

(t) For any e > there exists I such that for all I' > £, the proportion of a nodes in 
the interval [0,f ] is > ^ - e. 

This suffices because the existence of any circle in which all a nodes arc happy and in 
which Q < Xp < (the latter condition on Xp being satisfied almost certainly for large 
enough n, by the law of large numbers) would give a contradiction, since this circle could 
then be cut and infinitely many copies placed end to end, giving a counter example to 

(t). 

In order to prove (f), it will be useful to consider the right and left parts of any 
neighbourhood Niu) for u > w. We define C{u) (respectively 7^(m)) to be the leftmost 
(rightmost) w-many nodes in Niu). Note that any a node u > w must have a nodes in 
both C(u) and TZ{u). Also, any (3 nodes v > w must have a nodes in C{v) and in TZ{v), 
otherwise the next a node to the left or right (respectively) would not be happy. 

We define a sequence of a nodes. Let Uq = w. Given Uk define Uk+i to be the rightmost 
a node in := J\f{uk)- Also define Ck '■— ^{u^) and TZk = lZ{uk). Now, for m > 1 
define 1^ '■— UfeLo-^^ ^rn '■— SfcLo®(-Mfc)- Since each Uk is happy, ©(Afc) > 3, 
giving Sm > 3(m + 1). This doesn't immediately tell us the bias on the interval /„, 
however, because nodes may have been counted multiple times in forming the sum Sm- 
We therefore want to consider the way in which the neighbourhoods Mk overlap. For 
k > I, define HI := TZu-i n A+i (= M:-i n A4+i). Notice that Uk G W^, but that 
Uk-i Ml and Uk+i ^ A/"^. We similarly partition each Ml into a left and right part. 
Define := M^ fl Ck and TZ'f^ := M^ fl TZk- It is immediate that if TZ'^ is non-empty then 
all nodes in it are of type (5. Now we partition the nodes in according to the number 
of times they are counted in forming the sum 5"^, i.e. the number of Mk that they belong 
to for k < m. We define to be the set oi u e Im which belong to exactly r distinct 
neighbourhoods Mk for k < m. Notice that Ur>3 "'r" — U^^^-^fe- On the other hand, 
Uk e IJr>3 for 1 < A; < m — 1. We now want to assess the size and bias of each of the 
various J™. 

J{". We have Jf C C{uo) U n{um), so | Jf | < 2w. 

J^. We show that only (5 nodes can belong to J^. Suppose that m e is of type 
a. Then u e Ml for some k with 1 < k < m — 1, and by the remark above u ^TZ'^. So 
u = Uk or M e In cither case, u G Mk-2 will contradict the definition of Uk-i, while 
u e Afe+2 will contradict the definition of Uk+i, giving the required contradiction. 



SCHELLING SEGREGATION 



41 



J^, r > 5. Suppose u e Afk-2 n A^fc-i n Afk n Afk+i n N'k+2- Then either e A4-2 
contradicting the definition of Uk-i or e Aa;+2 contradicting that of Uk+i- Thus is 
empty for r > 5. 

J^. We claim that 6( J^) < m + w. Prom our analysis of for r > 3 it follows that 
we must have Uk e J^" ior 1 < k < m - 1. So e(J|") < m - 1 + S^Ji^ir^ - E^^l^|7^'fc|. 
It therefore suffices to show that E^^"^(|£'^| — \TZ'i^\) is bounded by w. For A; > 1 define 
dk '■= Uk — Uk-i- Then |7?.'^| = w — dk and = w — dk+i- Hence |>C'^| = |7^ji._|_i|, and 
- |7^1|) = \C'^_,\ - \n[\ < w, proving the claim. 

Putting these facts together, we get that 20 (/^) > 3m — 2w — m — w = 2m — 3w, 
so Q{Im) > m — ^w. Por m > choose a partition, = IIo U IIi U 112, so that 
I Ho I = iHil, IIo contains only ^ nodes, and Hi and 112 contain only a nodes (meaning 
that 1 112 1 > 1TL — ^w). Then since |/m| < 2w + 1 + mw, the proportion of the nodes in 
which belong to II2 is greater than 

m — ml 3 

i > 

2w + 1 + mw m + 3 w 2m + 6 
which tends to — as m — )■ cxd. Thus, for each e > 0, there exists m such that for all 
m' > m, the proportion of a nodes in 1^' is > — e, giving (f), as required. 

Actually, it is convenient to be able to have a little more than the existence of unhappy 
nodes of both types to work with. In the analysis that follows it will be convenient to 
work with configurations in which we are guaranteed the existence of unhappy nodes 
outside any interval of length Aw + 1. We can get this at the expense of assuming n 
to be reasonably large compared to w. Given any circle configuration in which all a 
nodes outside an interval of length 4w + 1 are happy, cut the circle at the left end of the 
interval of length 4w + 1 and consider the individuals in the circle to lie in a line with sites 
indexed by natural numbers < n (so those in the interval of length Aw + 1 occupy [0, Aw]). 
Performing an almost identical analysis we still conclude that, for any e > 0, so long as n 
is sufficiently large, there exists i such that for all i' > i (with £' < n), the proportion of 
a nodes in the interval [0, £'] is > — e. We conclude that with probability 1 — e the 
initial configuration on a circle of size n will have a value x sufficient to ensure unhappy 
nodes of both types outside any interval of length Aw + 1 in any configuration with the 
same number of a and (3 nodes, where e — > as n — > 00. 

Moving to complete segregation. So now we return to considering nodes arranged in a 
circle, and from now on we assume that all configurations considered have unhappy nodes 
of both types outside any given interval of length Aw + 1. We build a list of configurations 
from which it is possible to reach complete segregation. Consider first any configuration 
which is not completely segregated, but which has a run of length at least 2w. Without 
loss of generality, suppose that this is a run of a nodes occupying the interval [a, b] , where 
this interval is chosen to be of maximum possible length. If the nodes a and b are both 
happy then the length of the interval ensures that all nodes in the run are happy - this 



42 



GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 



follows by induction on the distance from the edge of the interval. In this case let u be an 
unhappy a node and let c G {a, h} be distance at least w + 1 from u. Then u and the /3 
neighbour of c may legally be swapped, increasing the length of the run by at least 1. So 
suppose instead that at least one of the individuals a and h is not happy, and without loss 
of generality suppose that a has bias less than or equal to h. Then a and 6+1 may legally 
be swapped. Performing this swap causes position 6 + 1 to have at least the same bias as 
h did before the swap, and causes a + 1 to have at most the same bias as a did before the 
swap. Thus, the swap has the effect of shifting the run one position to the right and may 
be repeated until the length of the run is increased by at least 1, i.e. for successive i > 
we can swap the nodes a + i and 6 + i + 1, so long as the latter is of type /3. The first 
stage at which the latter is of type a the length of the run has been increased. Putting 
these observations together, we conclude that from any configuration which has a run of 
length at least 2w it is possible to reach full segregation. 

Next consider a configuration in which the longest run [a, h] is of length at least w, but 
strictly less than 2w. We shall suppose that [a, b] contains a nodes, the case for (3 nodes 
is similar. Let c be the first a node strictly to the left of a. If c is unhappy, then we may 
legally swap c and a — 1, strictly increasing the length of the longest run. If c is happy 
then the distance between c and a is at most w and we may successively swap unhappy 
a nodes from outside the interval [c — w, a + 2w\ with the nodes c + i for 1 < i < a — c 
(starting with i = 1 and proceeding in order), in order to strictly increase the length of 
the longest run. This follows because as each node c + i performs the swap, it will become 
happy. 

It remains to show that we can always move to a configuration with a run of length at 
least w. Given any configuration (satisfying the condition that any other configuration 
reached by swapping unhappy nodes has unhappy nodes of both types outside any given 
interval of length 4w + 1), we first of all perform a procedure which selects a number of 
individuals with relatively unfavourable bias. Let uq be an a node with least possible 
bias. Given Uk for k < choose Uk+i outside IJj<fc-^(^i) which has least possible bias 
amongst all the a nodes at such sites (we can suppose that n is large enough that such a 
choice is possible). Once Uk is defined for each k < w"^, choose vq outside Uj<w,2 A/'('Uj), 
which has the greatest possible bias amongst all (3 nodes at such sites. Given Vk for k < 
choose Vk+i outside IJj<?i,2 A/'(Mj) and outside IJj<fc-^('^i) which has greatest possible bias 
amongst all the /3 nodes at such sites. Now choose an interval [a, b] of length w such that 
[a — w,b + w] has no intersection with any of the neighbourhoods N'{uj) or N'{vj) for 
j < w"^ (again assuming n is sufficiently large). 

The point of this procedure was to provide a pool of individuals which we can use for 
legal swaps. Now we perform another iteration, which produces a run in the interval [a, h\. 
At any point during the iteration we say that a neighbourhood N{uj) or M{vj) is tarnished 
if any node in that neighbourhood has been involved in a swap since the iteration began. 
Using the notation from Definition 3.2, if 7 = /3 then 7* = a, and otherwise 7* = (3. 



SCHELLING SEGREGATION 



43 



Step 0. Let 7 = a if a is of type a, otherwise let 7 = /3. At any given point in the 
iteration the value of 7 specifies the type of run that we are looking to produce, which 
may change as the iteration progresses. Set i := 0. 

Step s > 0. We are given a configuration in which all nodes in [a, a + i] are of type 7, 
and no nodes in [a — w,b + w] have yet been involved in swaps, except possibly those in 
[a, a + i]. Also, at most P of the neighbourhoods Miuk) or J\f{vk) are tarnished. Let j be 
the least > i such that a + j is not of type 7. If j > if then the iteration is complete, and 
we carry out no further instructions. Otherwise, if a + j is unhappy, we divide into two 
subcases. If a + j — 1 is happy then we can legally swap any unhappy 7 node from outside 
the interval [a — w,b + w] with a + j. If a + j — 1 is unhappy, we can select any Uk or Vk 
of type 7 whose neighbourhood is not tarnished and legally swap that node with a + j. 
In order to see why this is the case suppose that 7 = a, the case for P is similar. Firstly, 
the fact that at most of these neighbourhoods are tarnished means that we certainly 
have untarnished neighbourhoods of the appropriate type to choose from. Also, before 
the iteration began any Uk had at most bias Q{Af{a + j — 1)) (as that value was defined 
then). At the present point in the iteration, M{uk) being untarnished means that the 
neighbourhood Af{uk) is unchanged, while any node in the interval [a — w,b + w] which is 
now of different type than before, is presently of type a. Swapping Uk with a+j will cause 
it to have bias at least the same as that which a + j — 1 had before the swap. In either of 
these two subcases, once the swap is performed, redefine i := j and proceed to step s + 1. 
The final case to consider is that the individual at a + j is happy. In this case, if 7 = a 
then redefine 7 = /3, or if instead 7 = /3 then redefine 7 = 0. For successive values of k, 
^ ^ k < j, swap an unhappy 7 individual from outside the interval [a — w,b + w] with 
a + j — k, causing that node to become happy. Once this sequence of swaps is complete, 
redefine i := j and proceed to step s + 1. 

The simple model. We wish to show that, whatever the initial configuration, with 
probability 1 a configuration in which all nodes are of the same type is reached. For the 
purposes of this discussion, for 7 G we shall say that 7 is a minority type if there 

are at most as many nodes of type 7 as of type 7* (with 7* defined as in Section 6). 
The argument in Section 6 actually suffices to show that if 7 is a minority type, then 
there exists at least one 7 node which is unhappy, and which would have at least as many 
neighbours of its own (new) type if it changed type. Given any configuration, one can 
then select 7 of minority type and successively select nodes of type 7 which can legally 
have their type swapped, until all nodes are of type 7*. 

References 

[1] I. Benenson, E. Hatna, and E. Or, From Schelling to spatially explicit modelling of urban ethnic and 
economic residential dynamics, Sociological Methods & Research, 37(4) :463, 2009. 

[2] C. Brandt, N. Immorlica, G. Kamath, R. Kleinberg, An Analysis of One-Dimensional Schelling 
Segregation, STOC 2012. 

[3] F. Brauer and J. A. Nohel, The Qualitative Theory of Ordinary Differential Equations. Dover, 1989. 



44 GEORGE BARMPALIAS, RICHARD ELWES, AND ANDY LEWIS-PYE 

[4] Bruch, E. E. and R. D. Marc, Neighborhood Choice and Neighborhood Change, American Journal 
of Sociology, 112, 667-709, 2006. 

[5] W. Clark, Residential segregation in American cities: A review and interpretation. Population Re- 
search and Policy Review, 5(2):95-127, 1986. 

[6] M. Emerson, K. Chai, and G. Yancey, Does race matter in residential segregation? Exploring the 
preferences of white americans, American Sociological Review, pages 922-935, 2001. 

[7] Epstein, J. M. and R. Axtell, Growing Artificial Societies: Social Science from the Bottom Up. 
Washington, DC: Brookings Institution Press, 1996. 

[8] Fossett, M., Ethnic Preferences, Social Distance Dynamics, and Residential Segregation: Theoretical 
Explorations Using Simulation Analysis, Journal of Mathematical Sociology, 30, 185-273, 2006 

[9] Fagiolo, G., M. Valente, and N. J. Vriend, Segregation in Networks, Journal of Economic Behaviour 
and Organization, 64, 316-336, 2007. 
[10] R. Farley, C. Stceh, T. .Jackson, M. Krysan, and K. Reeves, Continued racial residential segregation 
in Detroit: Chocolate city, vanilla suburbs revisited. Journal of Housing Research, 4(l):l-38, 1993. 
[11] S. Gerhold, L. Glebsky, C. Schneider, H. Weiss, and B. Zimmermann, Limit states for one- dimen- 
sional SchcUing segregation models, Communications in Nonlinear Science and Numerical Simula- 
tion, 13(10):2236-2245, 2008. 
[12] S. Grauwin, F. Goffette-Nagot, and P. Jensen, Dynamic models of residential segregation: 
Brief review, analytical resolution and study of the introduction of coordination. Arxiv preprint 
arXiv:0907.1777, 2009. 

[13] A. D. Henry, P. Pralat, and C. Zhang, Emergence of segregation in evolving social networks. Pro- 
ceedings of the National Academy of Sciences, 108(21):8605-8610, May 2011. 

[14] W. Hurewicz, Lectures on Ordinary Differential Equations, M.l.T. Press, Cambridge Massachusetts, 
1958. 

[15] B. D. McKay, On Littlewood's estimate for the binomial distribution. Adv. Appl. Prob., 21 (1989) 

475-478. Available at http://cs.anu.edu.au/ bdm/papers/littlewood2.pdf. 
[16] R. Panes and N. Vriend, Schellings spatial proximity model of segregation revisited. Journal of Public 

Economics, 91(l-2):l-24, 2007. 
[17] T. Schelling, Models of segregation. The American Economic Review, pages 488-493, 1969. 
[18] A.M.Turing, The Chemical Basis of Morphogenesis , Phil. Trans. R. Soc. London B, 237, pp 37-72, 

1952. 

[19] M. White, Segregation and diversity measures in population distribution. Population index, 
52(2):198-221, 1986. 

[20] D. Vinkovic and A. Kirman, A Physical Analogue of the Schelling Model, Proceedings of the National 

Academy of Sciences, no. 51, volume 103, 19261-19265, 2006. 
[21] N. C. Wormald, Differential equations for random processes and random graphs, Annals of Applied 

Probability, 5:1217-1235, 1995. 
[22] Young, H.P., Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. 

Princeton, NJ: Princeton University Press, 1998. 
[23] J. Zhang, A dynamic model of residential segregation, Journal of Mathematical Sociology, 28(3): 147- 

170, 2004. 

[24] J. Zhang, Residential segregation in an all-integrationist world, Journal of Economic Behavior & 

OrganizaMon, 54(4):533-550, 2004. 
[25] J. Zhang, Tipping and residential segregation: A unified Schelling model. Journal of Regional Sci- 
ence, 51:167-193, Feb. 2011. 



SCHELLING SEGREGATION 



45 



George Barmpalias, State Key Lab of Computer Science, Institute of Software, Chi- 
nese Academy of Sciences, Beijing 100190, P.O. Box 8718, People's Republic of China. 
E-mail address: barmpalias@gmail.com 
URL: http://barmpalias.net 

Richard Elwes, School of Mathematics, University of Leeds, LS2 9JT Leeds, United 
Kingdom. 

E-mail address: r.h.elwes@leeds.ac.uk 

Andy Lewis-Pye, School of Mathematics, University of Leeds, LS2 9JT Leeds, United 
Kingdom. 

E-mail address: Eindy@aemlewis.com 
URL: http : / /aemlewis .co.uk 



